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RELIABILITY, HOMOGENEITY AND NUMBER 
OF CHOICES 


N. L. GAGE and DORA E. DAMRIN* 


Bureau of Research and Service College of Education, 
University of Illinois 


Reliability data are indispensable to any full evaluation of test 
validity, either for practical purposes or for theorizing about 
true relationships between traits. The traits themselves often 
depend, for evidence of their existence, on the demonstiation 
that they are reliably observable. 

But theorizing about test reliability often goes on in terms of 
mathematical manipulations of statistical concepts that are 
somewhat divorced from reality. Theorists develop assumptions 
and constructs to simplify their formulas but in doing so may 
disregard the psychological data to which their formulas will 
be applied. This creates the place for empirical studies. How- 
ever pat the derivations, we still like to see how the formulas 
work when applied to typical data. Such investigations can 
tell us how close to reality the assumptions and definitions have 
stayed, or how much practical difference their being violated 
actually makes. 


SINGLE-TRIAL ESTIMATES OF RELIABILITY 


Our first major interest was in comparing those estimates of 
reliability that can be made from single trials. Of the methods, 
we shall consider only the Spearman-Brown corrected split-half 
coefficient (S-B split-half), Guttman’s L,, and the Kuder- 
Richardson Cases III and IV (K-R III and K-R IV). Other 
formulas, such as Rulon’s,'? Guttman’s L; and Jackson’s 





* We have been helped by many criticisms and suggestions from Lee J. 
Cronbach; however, full responsibility for interpretations and conclusions 
rests with us. 
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sensitivity (gamma)'* all reduce to algebraic identities with, 
or mathematical functions of one of these (e.g., gamma = 


i); hence, they are omitted here. Loevinger’s estimate 


of homogeneity’? will also be considered, although it is not 
intended as an estimate of reliability in the usual sense. We 
shall examine below the definitions of equivalence involved in 
each of these methods. 

Spearman-Brown corrected split-half coefficients—Here the 
single test is split into halves (usually odd- and even-numbered 
items) and scores on the halves are correlated. The halves are 
assumed to be equivalent both with one another and with the 
halves of a hypothetical equivalent form as follows: 


Cg = Oh = C4 = OB; Tad = Tap = Tas = TA = Tow = TaB 


where a and b are halves of the actual test, A and B are halves 
of a hypothetical equivalent form. 

Using this definition of equivalence we apply the Spearman- 
Brown formula to estimate the reliability of the whole test, a + b, 


as follows: 
rad 
Tia+b)(A+ B) = roe 


Guttman’s L4.—This formula, according to its author, allows 
us to “‘dispense with assumptions of equivalence,’’® (p. 275). 
Cronbach? has shown, however, it can be derived by redefining 
equivalence so that oa45 = o442 ANd festaa = TanCal sa = Tayoad, = 
Top0vs = Tatq0». ‘This redefinition of equivalence of an actual 
test with a hypothetical one leads to the ‘lower bound’ formula: 


Ly = 2(1 — ste) 


8,7 


Guttman’s crucial assumption is experimental independence of 
items within trials. 

Kuder-Richardson Case III.—These writers" define equiv- 
alence of forms as interchangeability of items 7 and J, j 
and J, etc.; the members of each pair have the same difficulty 
(pi = p1, Pj = Ps, etc.) and are correlated to the extent of their 


Tir 





= 1. The inter-item correlations of one 





reliabilities: 


V Titi 
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test are the same as those in the other (ri; = riz, etc.). Since 
straightforward substitution of these equalities, into the formula 
for correlation between the two equivalent forms, leads to a 
need for unobtainable quantities like 7r;;, they make further 
assumptions to achieve a workable formula. Among these are 





T; 
1) that ‘‘item and test measure the same thing”’ ~— =})}, 
ae 
V rite 


(2) that all item intercorrelations are equal, thus yielding a 
matrix of inter-item correlations which has a rank of one (ra = 
Tor = ** * =Tn(n—y), and (3) that item variances are equal 
(PaJa = Pog, etc.). The resulting “formula 20” is: 

n oo — Upg 
n-—l1 o;? 





Te = 


where n = number of items 
o,? = variance of test 
p = proportion passing each item 
q@=i-p 
As Jackson and Ferguson have pointed out,’ (p. 76) the assump- 
tion of equal inter-item correlations and equal item standard 
deviations imposes a further requirement, unstated by Kuder 
and Richardson, that the items be equal or at most of two levels 
in difficulty. Loevinger has concluded that the method therefore 
‘applies only to a case of no importance,’”’ since complete satis- 
faction of these assumptions leads to ‘‘a test on which everyone 
scored either zero or perfect,”'? (p. 11). The coefficient is 
defensible however since its authors point out that “if the 
assumptions are not met, the figures obtained are under-esti- 


mates,’’!! (p. 159). 
Kuder-Richardson Case 1V.—The formula in this case is 


a n . o;? ee np q 
= n-—l1 a," 
Mean of the test scores 
n 





and g = 1 — p. 





where p = 


Here the assumptions are the same as for Case III with the 
addition that all items are explicitly assumed to have the same 
difficulty. Since this assumption is implicitly made in Case ITI, 
as mentioned above, this formula represents merely, as Loevinger 
puts it,!? (p. 13), ‘‘carrying one step further the consequences 








388 The Journal of Educational Psychology 


of assuming items of identical difficulty.”” Estimates by K-R IV 
have some merit in that the economy of effort in computing them 
counterbalances the still greater underestimation obtained as 
compared with K-R III. But their assumption of equal diffi- 
culty implies that they will lead to relatively inaccurate esti- 
mates of reliability for tests intentionally made to have items 
with a wide range of difficulty. 

The Concept of Homogeneity.—Loevinger'? has designed a 
formula to express a property of tests which is somewhat different 
from that approached by the above formulas and which she 
calls ‘homogeneity.’ This is the degree to which all items of a 
test measure the same ability or complex of abilities for all 
individuals tested. A necessary (but not sufficient) condition 
of perfect homogeneity is that all persons succeeding with items 
at one level of difficulty should also succeed with all items at 
lower levels of difficulty. Similarly ‘‘when the items of a per- 
fectly homogeneous test are arranged in order of increasing 
difficulty, every individual will pass all items up to a certain 
point and fail all subsequent items,’’!* (p. 28). Subsequently" 
Loevinger pointed out that these requirements apply in this 
form only to what she has denoted ‘cumulative’ as against 
‘differential’ homogeneous tests. 

Characteristics of tests which are closely related to Loevinger’s 
‘homogeneity’ have frequently been discussed by others using 
such terms as coherence,'® and unidimensionality or scala- 
bility.6 Perhaps the first consideration of this characteristic 
was by Walker,'® who in 1931 described a property of tests 
which he called ‘higgledy-piggledyness.’ To measure this 
property, Walker developed a ‘coefficient of hig’?® which he 
later?! found of dubious merit. 

In 1937, Newcomb verbally described the same characteristic 
of cumulative ‘homogeneous’ tests as applying to attitude scales: 
‘‘No scale can really be called a scale unless one can tell from a 
given attitude that an individual will maintain every attitude 
falling to the right or to the left of that point (depending on how 
the scale is constructed),’’!® (p. 897). 

In 1941, Ferguson,‘ (p. 54) stated that “at present no con- 
venient quantitative measure is available for estimating the 
divergence of an obtained answer-pattern matrix from a theo- 
retically unique matrix.” Later,‘ (p. 66) he describes a measure 
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of homogeneity which is equivalent, in Loevinger’s terms, to the 
variance of the test divided by the variance of a perfectly homo- 
geneous test with the same item difficulties. 

Loevinger’s index is an attempt at rational quantification of the 
concept. We shall examine its relationship to the concepts of 
‘reliability’ embodied in the other operational definitions con- 
sidered here. Her ‘homogeneity’ purports to be not another 
type of ‘reliability’ coefficient but a partial alternative to it. 
Her index relegates to error variance all of the following: group 
factor variance, specific factor variance, and errors of measure- 
ment of items. This means that a perfectly homogeneous test 
is one whose entire variance is entirely attributable either to a 
single general factor, or to ‘“‘an approximately constantly 
weighted sum of factors,’’!* (p. 522). 

Loevinger points out that her index of homogeneity is only 
one of many possible ways of estimating this property of tests. 
Its sampling properties are as yet unknown. We have studied it 
here because no empirical studies of the index have yet appeared. 
Our assumption is that the interpretability of the statistic will be 
enhanced by a knowledge of how it varies with number of choices 
and test length, and of how its size compares with the coefficients 
yielded by single-trial estimates of ‘reliability.’ 

Since the estimate of homogeneity, 


Vz Drs V het 


est H; = Sa 





where V, = Variance of the test, 
Vaee = Variance of a perfectly heterogeneous test with the 
given item difficulties, 
Viom = Variance of a perfectly homogeneous test with the 
given item difficulties, 
it should be noted that it differs from K-R III by the omission of 


the 2 
ni--— 





term, and in the denominator. In Loevinger’s terms, 


nr Vs — Viet 
n-—-l V; 


1 





K-R III = 


TEST RELIABILITY AND NUMBER OF CHOICES 


The second major problem at which this study is aimed is the 
relationship between single-trial estimates of reliability and 
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number of choices per test item. It has long been realized that 
some of the error variance in multiple-choice tests of ability 
arises from chance success in choosing the correct alternative 
from those furnished in each test item. It should follow that 
increased test reliability will result from an increase in the 
number of functioning (in the sense of increasing the r between 
correct choice and total score) choices per test item. 

The Spearman-Brown Formula.—This hypothesis has already 
been investigated in a series of studies by Remmers and his 
co-workers. Three of these studies'*’ are pertinent here since 
they involved especially-designed experiments using ability 
tests in which each item is scored zero or one. In these studies 
Remmers investigated whether the changed reliability could be 
predicted by means of the Spearman-Brown formula, with 
change in test length conceived as the ratio of new number 
of choices per item to old. 

In their studies it is evident that the sets of four reliability 
coefficients, although they did not differ significantly (in terms 
of critical ratios) from the Spearman-Brown predictions, did 
not in all cases follow a curve similar to that obtained by pre- 
dicting with the Spearman-Brown formula from the two-choice 
form. This is especially true of House’s data, in which the 
four-choice form was much more reliable than the Spearman- 
Brown formula predicted and the five-choice form showed 
almost no increase in reliability over the four-choice form. These 
discrepancies indicate the desirability of further investigation. 

Two refinements of experimental procedure could be introduced 
to eliminate certain possible sources of the observed discrepancies 
between actual and predicted ‘reliability’ coefficients. The first 
is controlled rather than random elimination of misleads to 
convert five-choice items into four-, three-, and two-choice 
items. The second is administration of a ‘control’ test that 
would be the same for all groups, thus making possible empirical 
verification of the groups’ comparability on the experimental 
forms and adjustment of the obtained coefficients for differences 
in the ‘range of talent’ of the groups. These refinements are 
further described below. 

Lord’s Formula.—Since the Spearman-Brown formula was 
originally developed to express the correlation between sums and 
differences, one cannot assume its relation to the correlation 
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between forms of a test differing only in number of choices per 
item. Lord’ objected to its use in this way on the ground 
that items cannot be made perfectly reliable simply by adding 
choices, and the Spearman-Brown formula does not take into 
account the difficulty of the test. 

Lord has sought to avoid these difficulties by deriving an 
adaptation of the Spearman-Brown formula. Assumed in his 
derivation are (1) equal inter-item correlations (four-fold point 
r’s), (2) equal item difficulties, (3) equal plausibility of choices 
within each item to individuals not knowing the correct answer, 
(4) answers to all items by all individuals, and (5) values of 0 or 1 
for the choices in each item. 

It can be objected that the assumptions of neither the Lord 
nor the Spearman-Brown formula will ever be met by actual 
test situations. It is nonetheless relevant to the evaluation of 
them, and to the theory underlying their development, to deter- 
mine which more closely predicts the changes in reliability found 
empirically when changes in number of choices are introduced 


into a given test. 


PROCEDURE 


Our procedure in investigating the two major questions out- 
lined above may be described in terms of (1) the test used, (2) the 
subjects tested, and (3) the administration of the tests. 

The Test and Its Forms.—In choosing the test to be used, we had 
several considerations in mind. The test should be fairly homo- 
geneous in content so that application of Kuder-Richardson 
formulas would be not altogether unjustified. It should be well- 
refined in terms of the discriminatory power and difficulty of 
its items and their misleads, so that we could have some assurance 
that all items and misleads were functioning. There should be 
adequate data on the difficulty and discriminatory power of the 
items and misleads so that some empirical basis could be used 
in the construction of the various forms. It should be a power 
test rather than a speed test since these formulas render spurious 
estimates of ry for speed tests. It should be an ability test 
rather than a non-intellective test so that the assumptions of 
certain of the formulas to be applied would be met. It should 
be applicable over a fairly wide range of ability so that students 
in both high-school and college classes could be tested. Finally, 
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the scores obtained should have some promise of usefulness to 
the codperating teachers and students to repay them for their 
time and effort. 

The test selected as meeting all these requirements was the 
Ohio State University Psychological Test, Form 21, which has 
been developed and refined over many years by H. A. Toops 
and his collaborators. Of the one hundred fifty items only the 
first ninety, comprising Parts 1 and 2, were used. The first part 
consists of thirty same-opposite vocabulary questions illustrated 
by the following: 


Little is the SAME AS 1 coarse 2small 3 prodigious 
4immense 5 feeble 


The second part consists of sixty verbal analogies of the following 
kind: 


boy: boys: :man: 
1 girls 2 men 3 man’s 4 men’s 5 gentlemen 


These 90 items were arranged in order of difficulty on the basis 
of item analysis data.’** The odd-numbered items, after this 
rearrangement, were designated ‘control’ items; they were left 
unchanged and were administered in the same form to all subjects. 
These ‘control’ items were intended to provide a means of 
verifying the comparability of the groups of subjects that were 
tested with ‘experimental’ items described below. 

The even-numbered items were treated as the ‘experimental’ 
test by having their number of choices varied. Four forms of 
this test were made as follows: Form 5 consisted of the even- 
numbered items with five choices, i.e., intact, as in the original 
test developed by Toops. Form 4 consisted of the same items 
with one of the false choices eliminated; thus each of the Form 4 
items had only four choices. Form 3 similarly had only three 
choices per item, i.e., two of the four original false choices were 
eliminated. Finally Form 2 had only two choices per item, 
three of the original four false choices having been eliminated. 

The dropping of false choices to make Forms 4, 3, and 2 was 
done in terms of the item analysis data furnished by Toops. 





* We are grateful to Professor Toops for his permission to use this test 
and for the item analysis data, based on 1,000 Ohio State University fresh- 
men, which he supplied. 
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To make Form 4, the experimental items were arranged in order 
of difficulty. The items were then put into four groups, every 
fourth item going into the same group. The distributions of the 
item-test validity coefficients for these four groups were com- 
pared and found to be similar. From the first of these groups, 
the most popular mislead was eliminated; from the second group, 
the second most popular mislead was eliminated, etc. 

To make Form 3, the same procedure was used except that 
only three groups of items, roughly matched on difficulty and 
validity, were formed. Of the remaining three misleads, the 
most plausible was dropped from the first group of items; the 
second most plausible, from the second group; etc. Form 2 
was made by a repetition of this procedure after the items had 
been divided into two groups. 

This procedure gave us some assurance that the forms would 
not differ sharply in respects other than the number of choices. 
The plausibility of the eliminated choices and hence of those 
remaining was controlled to produce a more representative 
manipulation of the items than chance alone might have yielded. 

The Subjects.—Approximately one thousand high-school and 
college students were tested.* The high-school students were 
the entire eleventh and twelfth grade classes of three high schools, 
of which two were predominantly urban and the third mainly 
rural. The college students were for the most part sophomores 
and juniors enrolled in various sections of an introductory 
educational psychology course. Of those obtained, only the 
nine hundred seventeen answer sheets were used in subsequent 
analyses which had one and only one answer to every item. 

Administration of the Tests —The students were directed to 
proceed through the questions in the order given, to attempt 
every question, and to guess if necessary. These instructions 
were intended to eliminate individual differences in ‘willingness 
to guess’ as a possible non-intellectual factor affecting scores 
and, in effect, made the test more constant for all subjects. 
Furthermore, the 45-minute time limit proved sufficient to 
allow more than ninety per cent of the subjects to attempt all 





* For permission to test their students, we are grateful to Mr. James K. 
Felts, Monticello H.S.; Miss Sally Fisher, Urbana H.S.; Miss Vera Kaden, 
Champaign H.S.; Professors W. R. Dixon, Ray Simpson and Graham Pogue, 
University of Illinois. 
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items. Since the incomplete answer sheets were eliminated from 
the statistical analysis, the results are based on what was essen- 
tially an unspeeded, power test for all subjects. 

The tests were distributed to the students in rotated order so 
that the first student in any room received Form 5, the second 
Form 4, etc. This mechanically rotated distribution was 
intended to yield random selection of students, and hence com- 
parable groups, for each form. Scores on the ‘control’ items 
provided a means of ascertaining, and statistically adjusting for, 
such dissimilarity between these groups as still appeared. 


RESULTS 


Four scores (number right) were obtained for each student. 
Score 1 was based on the forty-five odd-numbered ‘control’ 
items. Score 2 was based on the forty-five even-numbered 
‘experimental’ items, which differed from form to form in number 
of choices. Scores 3 and 4 were based on odd and even halves 
of the ‘experimental’ items, respectively, to provide data for 
computing those reliability estimates (corrected split-half and 
Guttman’s L,) which require splitting the test. 

‘Control’ Test Results —Table 1 shows the means and standard 
deviations of the ‘control’ scores of the four groups. Also given 
are the means and standard deviations of the scores on the ‘experi- 
mental’ items. 

These figures show that the mean scores on the control items 
did not differ markedly from one group to another. Indeed, 
as the F-ratios of the analysis of variance indicate, the ‘between 
groups’ variance for the total group, far from being significantly 
large, is significantly smaller than chance would allow. This 
merely reflects the wide range of talent tested and the efficiency 
of the rotated distribution of forms in securing similar groups. 

As is to be expected, the mean scores on the experimental forms 
decrease regularly (except for the college students on Form 5) 
from Form 2 to Form 5. The forms increased in difficulty (or 
decreased in susceptibility to chance success and to success 
through partial knowledge) as the number of choices increased. 
Since we wished in any case to make adjustments in the reli- 
ability estimates for such variations in ‘range of talent,’ we did 
not apply tests of significance to the differences among standard 
deviations of the four ‘form-groups’ on the control items. 
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TaBLE 1.—MEANS AND STANDARD DEVIATIONS ON ‘CONTROL’ 
AND ‘EXPERIMENTAL’ ITEMS 


Total Group 


Results of Analysis 
of Variance Be- 


Test Control Control Exper. Exper. tween Groups on 
Form N M SD M SD Control Items 
2 241 18.32 10.20 30.06 7.55 
3 223 19.24 10.03 24.55 9.12 
4 228 18.33 10.52 21.23 10.49 = .02, p > .05 
5 225 18.56 10.25 19.23 10.40 
High-school Juniors 
2 84 13.26 6.66 26.73 5.76 
3 80 13.05 6.35 20.62 7.33 
4 85 13.33 7.95 16.67 8.80 F = 1.81, p > .05 
5 84 15.35 7.80 15.89 7.93 
High-school Seniors 
2 96 15.48 7.89 28.60 6.96 
3 81 15.99 8.00 22.59 8.20 
4 82 16.46 9.50 19.68 10.07 F =.70, p > .05 
5 87 14.97 7.97 15.79 8.95 
College Students 
2 61 29.74 8.63 36.93 6.17 
3 62 27.87 9.60 32.16 7.71 
4 61 27.82 8.63 29.67 7.98 F =.70,p > .05 
5 54 29.37 9.27 29.96 8.52 


Comparisons of ‘ Reliability’ Formulas.—In Table 2 are given 
the various coefficients obtained on the ‘experimental’ test. 
On the left-hand side are the coefficients actually obtained; 
on the right are the coefficients adjusted to constant variability, 
or ‘range of talent.’ The standard deviation of the Form 2 
group on the control items was used as the ‘anchor’ to which the 
reliabilities of the experimental forms taken by the other groups 
were adjusted. In comparing the adjusted coefficients shown in 


Table 2, we can see that (1) the Spearman-Brown corrected odd- 
even coefficients are all slightly larger than those obtained by 
Guttman’s L,, (2) the K-R III coefficients are all larger than 
those obtained by K-R IV, (3) the Spearman-Brown corrected 
odd-even coefficients are larger than those by K-R III except 
for Form 5, (4) the K-R III coefficients are not consistently 
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TaBLE 2.—COEFFICIENTS OF ‘RELIABILITY’ FOR THE Four 


EXPERIMENTAL ForMS 


Total Group 





































































































Adjusted for Variability 
Obtained on Control Test 
Spear- Spear- 
Fr man- Kuder-|Kuder-| man- Kuder-|Kuder- 
°r™ | Brown| Gutt-| Rich- | Rich- |Brown| Gutt-| Rich- | Rich- 
Cor- | man |ardsonjardson| Cor- | man |/ardson|ardson 
rected} Ly | Case | Case |rected| Ly, Case | Case 
Odd- III IV | Odd- III IV 
even even 
2 .867 | .858 | .863 | .844 | .867 | .858 | .863 | .844 
3 .906 | .901 | .898 | .886 | .909 | .904 | .902/ .890 
4 .934 | .929 | .928 | .918 | .930 | .925 | .923 .913 
5 .926 | .923 | .928 | .918 | .925 | .922 | .927 | .917 
High-school Juniors 
2 .797 | .796 | .737 | .688 | .797 | .796 | .737 | .688 
3 .854 | .852 | .831 | .810 | .867 | .865 | .846 | .827 
4 .901 | .900 | .898 | .884 | .859 | .858 | .855 | .835 
5 .855 | .854 | .870 | .855 | .801 | .800 | .822) .801 
High-school Seniors 
2 .814 | .814 | .828 | .803 | .814 | .814 | .828 | .803 
3 .869 | .867 | .871 | .852 | .865 | .863 | .868 | .848 
4 .918 | .911 | .920 | .911 | .881 | .871 | .884] .87]1 
5 .909 | .907 | .905 | .892 | .907 | .905 | .903 | .890 
College Students 
2 .859 | .846 | .865 | .845 | .859 | .846 | .865 | .845 
3 .899 | .890 | .886 | .865 | .875 | .864] .859 | .833 
4 .911 | .906 | .886 | .860 | .911 | .906 | .886 | .860 
5 .903 | .893 | .905 | .882 | .888 | .877 | .891 | .864 
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related to the L,’s, being slightly larger in two cases and slightly 
smaller in two. 

The results shown in Table 2 do more than provide an empirical 
demonstration of known mathematical relationships between the 
several formulas. Of practical importance to the test-user in 
any field is the magnitude of the differences between various 
kinds of estimates of reliability. It will be noted that for a 
test of this type the ‘underestimate’ yielded by Guttman’s L, 
is so slight as to be unimportant in practice. Likewise, the 
differences between the two types of K-R coefficients are, for 
most practical purposes, not of sufficient size to justify the extra 
labor involved in the computation of K-R III. The latter 
finding is especially interesting in view of the wide variation 
in the difficulty of these test items. 

Although the estimates obtained by the split-half procedure 
are for the most part higher than those by the K-R formulas, 
this in no way establishes the superiority of this method. Such 
a finding is the result of chance, for the size of the coefficient 
obtained for any specific split is a function of the ratio of the sum 
of the half-test covariances to the inter-half covariances. If 
our split happens to be such as to maximize the within-half 
covariance terms and minimize those between halves, our 
coefficient will be low; if the reverse occurs the coefficient will be 
appreciably higher. 

Comparison of Forms Differing in Number of Choices.—If the 
adjusted coefficients shown in Table 2 are compared from form 
to form, it is seen that they rise regularly from Form 2 (two- 
choice) to Form 4 for the total group. In two of the coefficients 
(S-B odd-even. and L,) there is a slight drop in Form 5 from Form 
4 while in the other two (K-R III and K-R IV) there is a slight 
continued rise from Form 4 to Form 5. In the subgroups by 
educational level, the trends are not so regular. Among the 
high-school juniors, the split-half coefficients (S-B odd-even and 
L,) are highest for Form 3 while the K-R III and IV are highest 
for Form 4. Among the high-school seniors, all coefficients 
regularly increase from Form2to5. Among the college students, 
the peak is hit at Form 4 for the split-half coefficients and at 
Form 5 for the K-R’s. 

How are these trends to be interpreted? In the first place, the 
general hypothesis that reliability increases as number of choices 
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is increased tends to be supported by these data. When it is 
remembered that the K-R coefficients are unique estimates and 
depend in no way on judgment or chance in test-splitting, con- 
fidence in the hypothesis is further bolstered. Thus, the failure 
of the Form 5 split-half coefficients to show continued increase 
over Form 4 merely reflects ‘bad luck’ in splitting this test into 
equivalent halves. 

Secondly, the level of ability of the group tested may be related 
to the number of choices at which maximum reliability is obtained. 
The fact that the highest K-R coefficients appear at Form 4 for 
high-school juniors but at Form 5 for both high-school seniors 
and college students indicates that the addition of choices may 
indeed lower reliability by increasing the difficulty of the test 
beyond the optimal point for the level of ability of the group. 

Comparison of Lord and Spearman-Brown Formulas.—We have 
already described the two formulas that have been suggested for 
predicting change in test reliability with change in number of 
choices per test item. We shall evaluate them in terms of how 
closely their predictions accord with our adjusted K-R III 
coefficients. 

Figure 1 shows the adjusted coefficients for all students by the 
K-R III formula, and those predicted from Forms 2, 3, and 4 
by the Spearman-Brown and Lord formulas. 

In predicting from Form 2, the Spearman-Brown is more 
accurate than Lord’s formula. This is especially true of the 
predictions to Forms 3 and 4 from Form 2. 

In predicting from Form 3 to Forms 4 and 5, however, the 
two formulas yielded the same estimate for Form 4 and Lord’s 
formula was slightly more accurate for Form 5. 

Finally, in predicting from Form 4 to Form 5, the Lord formula 
was slightly more accurate. Thus neither formula emerges 
with a clear-cut superiority. 

We offer no rationalization for the failure of Lord’s formula, 
especially derived for the use here made of it, to predict changes 
in reliability more closely than the Spearman-Brown formula. 
It is evident that Lord’s assumptions are not met by our test. 
Nor would they be much more closely approximated by any 
mental ability test now in use. It is tempting therefore to 
ascribe its comparative failure to its unrealistic assumptions. 

But the use of the Spearman-Brown formula is also vulnerable 
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Figure 1. Obtained and Predicted Kuder-Richardson Case III Coefficients 
for Different Numbers of Choices. 
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since it was not derived for the use here made of it and since no 
rationale for its relative success has yet been offered. In increas- 
ing number of choices, we seem to do nothing strictly analogous 
to increasing number of items, number of minutes, or number of 
judges, to all of which the Spearman-Brown should and does 
apply. Yet the fact remains that when a two-choice test is 
changed to a three-choice or a four-choice test, we obtain reli- 
ability estimates very similar to those we would get if we had 
multiplied the number of items by 1.5 or by 2. 

The practical significance of these findings for test builders is 
that test ‘reliability’ can be increased as predictably by increasing 
number of choices as by increasing number of items. If the 
author of a fifty-item two-choice test finds it has a reliability, 
by any of the formulas, of say .82, and wishes to increase this 
figure to, say, .90, he can choose between writing another fifty 
two-choice items, maximally equivalent to those already written, 
or writing an additional two valid choices for each already 
written item. 

The difficulty and validity of the choices added would need 
to be carefully considered since they might add nothing if they 
were less plausible and valid than the choices already present 
in the items. But similar considerations would apply to what- 
ever new items might be added. The decision between these 
paths to higher reliability might then depend on whether it was 
more feasible to add choices or to add items without changing 
the factorial composition of the test. 

Loevinger’s Homogeneity Formula.—To throw light on its 
behavior with a test of the kind for which it was designed (a 
power test of ability) we have computed Est H;, for our various 
forms. 

The values of Est H; (unadjusted for range of talent) in Table 4 
show that this statistic increased in size as the number of choices 
increased, up to four choices, just as did the unadjusted K-R III 
reliability estimates. This suggests that, as the error of measure- 
ment of the individual items was decreased, a greater proportion 
of the total variance went into the general factor variance of the 
test. 

But Est H, did not increase regularly as the number of items 
increased from twenty-two to forty-five to ninety, for Form 5. Thus 
Est H; increased from .265 to .311 for the change from twenty-two 
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to forty-five items (Set A) but decreased from .311 to .307 for 
ninety items. K-R III on the other hand, increased regularly 
from twenty-two to ninety items. (It is noteworthy in passing 
that the changes in K-R III values from that for twenty-two 
items to those for forty-five and ninety items are in very close 
accordance with Spearman-Brown formula.) 


TABLE 4.—VaLUES OF LOEVINGER’s Est H; anp K-R Cass III 
(Not CORRECTED FOR RANGE OF TALENT) FOR TESTS 
DIFFERING IN NUMBER OF CHOICES PER ITEM, AND IN 
NUMBER OF ITEMS PER TEST 


Loevinger’s K-R 


Type of Test Est H, Case III 
45-item, 2-choice . 182 . 863 
45-item, 3-choice . 229 . 898 
45-item, 4-choice 311 .928 
45-item, 5-choice 311 . 928 
22-item, 5-choice . 265 .843 
45-item, 5-choice (A) 311 . 928 
45-item, 5-choice (B) . 302 . 926 
90-item, 5-choice (A + B) .307 . 962 


When the Kuder-Richardson coefficient and Loevinger’s 
homogeneity index are computed on the other half (B) of the 
ninety-item test, the two estimates do increase together. This 
indicates that while Est H, and K-R III need not vary together, 
it is possible for them to do so. Furthermore the numerical 
values of Est H; are much lower than those of the ‘reliability’ 
coefficients for the same tests. 

These results show that, although proposed as a ‘partial alter- 
native’ to reliability, homogeneity does not reflect increased test 
length as reliability estimates do. This is perhaps understand- 
able in that homogeneity depends on the persons’ consistency in 
(1) failing all items requiring ability beyond a point of maximum 
difficulty and (2) passing all items below that point. If, as the 
number of items increases, the differences in difficulty between 
items become smaller, such consistent behavior becomes less 
probable, and the values of H would decrease. It follows that, 
of two tests differing only in their distributions of item difficulty, 
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the one with the greater variability in item difficulty should 
yield the higher H. 

We have looked into this possibility by computing H; for 
two sets of nineteen items from our five-choice form. The 
distributions of difficulty (p) of the two sets differed as shown 


below: 





Difficulty Number of Items 

p Set A Set B 

.60—. 69 3 

.50-. 59 3 

.40-.49 3 10 

.30—-.39 3 8) 

.20—. 29 3 

.10—.19 3 

.01-.09 1 

Total items 19 19 

No. of cases 225 225 

Mean 7.38 7.64 

S. D. 4.47 5.14 

Est. H; .345 .310 


The difference in H, between Sets A and B, although apparently 
not great, is in the predicted direction. In the absence of 
sampling statistics for H:, we have, of course, no adequate way 
of judging the significance of this difference. 

It is evident that H, although it has limits of zero and unity for 
perfect heterogeneity and homogeneity, respectively, does not 
have the same interpretability as do reliability coefficients. AH; 
was much lower than ry for a given test. How high H; should 
be to indicate a ‘high’ degree of homogeneity we have as yet 
no way of knowing. What is needed is some kind of ‘norms,’ 
arrived at either empirically or through derivation of the ‘stand- 
ard error’ of H;, which will give us a basis for interpreting values 


of H t- 
SUMMARY 


1) This research deals with two problems: (1) differences 
between single-trial estimates of reliability and homogeneity, 








Reliability, Homogeneity and Number of Choices 403 


and (2) differences between two formulas for predicting change 
in reliability with changes in number of choices per test item. 

2) The procedure was to administer four forms of Parts 1 
and 2 of the O. S. U. Psychological Test, the forms differing in 
number of choices per test item, to four groups of high-school 
and college students. Values of the corrected odd-even, Gutt- 
man Ly, Kuder-Richardson Case III and Case IV, and Loevinger 
homogeneity statistics were computed from the resulting data. 
Adjustments were made for range of talent to make the reliability 
estimates comparable from form to form; controlled rather 
than random elimination of choices was used to make the various 
forms. 

3) The differences between the obtained single-trial estimates 
of reliability were in the directions expected from analyses of the 
derivations of the formulas. More importantly, the differences 
here obtained with a fairly typical power test of ability were too 
small in size to be of practical significance. 

4) The differences between the formulas (Lord and Spearman- 
Brown) for predicting change in reliability with number of 
choices did not clearly indicate the superiority ofeither. The 
accuracy of both compared favorably with that of the Spearman- 
Brown when applied to number of items. 

5) The Loevinger homogeneity index increased as did the 
reliability estimates as number of choices was increased. Its 
numerical value was much smaller than that of the reliability 
coefficients. There was no basis for judging the psychometric 
or statistical significance of the values obtained. That is, did 
they indicate a relatively heterogeneous or homogeneous test? 
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GROUP GUIDANCE IN THE PROGRAM OF A 
READING LABORATORY 


GLADYS L. PERSONS and MARY H. GRUMBLY 
University of Bridgeport, Bridgeport, Conn. 


The purpose of a reading laboratory or clinic is the rehabilita- 
tion of retarded students in skills of reading, study habits, self- 
confidence and personal development. Naturally, guidance 
should play a large part in this program. At the Laboratory of 
the University of Bridgeport, such a program has been initiated. 
The procedures and results will be discussed in this article. 

Of the twelve principles and practices of Guidance as set 
forth by Cox, Duff and McNamara, it is apparent that six are 
fundamental to the philosophy of a reading laboratory and are 
in effect in varying degrees according to the size and duties of 
the staff. These six principles of guidance are:* 

1) Guidance consists in helping pupils to set up objectives 
that are for them dynamic, reasonable, and worth while, and 
in helping them, so far as possible, to attain these objectives. 

2) The major fields in which guidance is necessary are health, 
vocation, avocation, education, and human relations. 

3) The idea of guidance is inherent in all efforts to educate. 

4) The kind and amount of guidance needed varies greatly 
with different children and in different situations and at different 
times. 

5) A research and measurement program is an essential part 
of successful guidance work. 

6) The proper adaptation of curriculum and method to the 
needs of individual pupils is best promoted through guidance 
activities of teachers working in a democratically organized 
school system. 

That these principles are in operation in the Reading Labora- 
tory of the University of Bridgeport is evidenced by the impor- 
tance placed on the case history in the diagnosis, the conferences 
with one or both parents, conferences with school principals and 
guidance directors and the eager acceptance of all school records 
and records of previous tests. It is apparent, too, in the testing 





* Cox, Duff and McNamara. Basic Principles of Guidance, New York: 
Prentice-Hall, Inc., 1948. Quoted by permission of the Publishers. 
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which precedes instruction of each and every student. The 
following tests are administered (forms of some being adapted 
to age and achievement of individual tested) : 


Test Purpose of the Test 
1) Stanford-Binet or Wechsler- to determine intelligence quo- 
Bellevue Test of Mental tient and to gain insight 
Ability into personality 
2) Draw-a-Person to gain additional insight into 
personality 
3) Iowa Silent Reading or Stan- to determine the silent read- 
ford Silent Reading or Gates ing level 
Primary or Reading Readi- 
ness 
4) Vocabulary—lInglis to determine word knowledge 
5) Standardized Spelling to determine ability to spell 


and obvious causes of 
spelling difficulties 
6) Telebinocular—a screening to refer to an ophthalmologist 
test of vision if results are unsatis- 
factory 
7) Gray Oral Check to determine the level of oral 
reading and types of 
errors 
8) Dominance to determine if left-sidedness 
or mixed dominance is 
present 
9) Ophthalmograph to get a picture of eye move- 
ments during reading and 
of rate of reading 
10) Achievement Test in com- to study growth and achieve- 
mon school subjects ment of individual 


Once the child has been accepted as a student, there is an 
attempt to see that his health and personality needs are met. 
There are interviews with doctors, eye and ear specialists and 
psychiatrists; contact is made with any and all who have been 
ministering to the child. Efforts are made to develop apparent 
talents and to nourish aggressive tendencies where only self- 
distrust and timidity exist. Instruction is planned to meet 
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individual needs and re-testing at intervals to measure gains 
takes the place of subject-matter examinations and reports. 

In order to meet individual needs in instruction, students are 
grouped according to their reading level and, whenever possible, 
according to their stage of physical and social growth. Because 
all students have some history of retardation before coming to 
the Reading Laboratory, there are no students below the age of 
eight and seldom any above the age of twenty in the morning or 
what is known as the full-time session. Student groups vary in 
number from one to eight or ten; they work with materials of 
interest to their age and of difficulty suitable to their reading 
level. 

One typical group is that of five twelve-year old boys who have 
advanced in regular school as far as Grade IV but because of 
reading disability have little hope of going ahead. They are, 
therefore, in the Reading Laboratory. Two of them have 
better than average intelligence quotients; two are average. 
One is probably a mental defective whose personality makes up 
for much. (Those dealing with this student feel that once skills 
of reading are acquired, his placement in the intelligence test 
may be changed.) These boys are talking about vocational 
plans already and evince interest in occupations and their own 
capacity for profiting by training in certain vocational fields. 

Every reading laboratory has a group of adolescent boys who 
were pushed through grade school and who then realized with 
a shock (as did their parents) that they were not prepared for 
high school even though they had been handed a grade-school 
diploma. Such sixteen-year-olds have constantly in mind the 
importance of preparing toearnaliving. In addition to develop- 
ing reading skills, they want to know what fields they may hope 
to enter, where they may obtain training for the vocation of 
their choice, and what the educational requirements are. 

A typical evening group of students in the Reading Laboratory 
is composed of young working men who hope for advancement in 
their places of employment or who hope to take advanced courses 
which will lead them into other fields of work, both more interest- 
ing and more profitable. A realization that lack of reading skills 
is keeping them from their desired goals has brought them to the 
Reading Laboratory. Although through their daily experiences 
they develop knowledge of the working world, they want some 
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help in broadening their understanding of fields of work and of 
those qualities which contribute to advancement. 

The work of a reading laboratory, therefore, could be strength- 
ened by placing more emphasis on educational and vocational 
guidance. Vocational interest could conceivably provide the 
motivating force in accelerating gains in reading skills, but, 
above all, knowledge of the working world is due these students 
because of the imminence of earning a living and assuming the 
duties of citizenship. They should be given some idea of their 
responsibilities as citizens and some idea of how our government 
works and of how much reading skill they need for their chosen 
work and for citizenship. 

Teen-age boys and girls following re-training in reading, should 
have at least a tentative vocational goal, and should leave the 
reading laboratory for a school where they may expect training 
which will help them attain that goal. Therefore, some voca- 
tional aptitude testing should be planned for this group—not 
necessarily within the reading laboratory, itself. 

A study of occupations, if introduced into the program of 
the reading laboratory, could contribute to the reading program, 
widen the horizons of these boys and girls as their reading skills 
are being developed, and fortify them with knowledge of them- 
selves and the world of work which will help them and their 
parents to select the schools which they will enter following 
training in reading. Any such courses, however, must con- 
tribute, as well to the development of reading skills and must be 
planned to harmonize with the philosophy and method of the 
reading program. 

The schools in the area of the reading laboratory should be 
known to the laboratory, particularly those schools where stu- 
dents may build on the reading skills developed in the reading 
laboratory. It is important to know, also, the schools where 
vocational training is given. Schools which develop special 
talents and yet do not require a high-school diploma, should be 
known to the laboratory. There should be knowledge, too, of 
educational advisement services and of the most readily available 
and reliable vocational advisement and testing service. 

It appears that a group guidance course would be the logical 
means of helping the students of a reading laboratory develop 
their understanding of the world of work, develop some skill 
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in the scrutiny of self and some ability to determine means and 
places for self-improvement. It should be the aim of such a 
course to bring about the increasing ability for self-guidance 
and self-direction which will stand by the student after he has 
left the relatively protective atmosphere of the reading laboratory. 

In February of last year, such a course was introduced into 
the program of the Reading Laboratory at the University of 
Bridgeport. It was offered to a group of five adolescent boys, 
only one of whom had been a regular high-school student. 
These boys were at ease with one another, thus presenting no 
problems as to free discussion. They were thinking in terms 
of the future, and were, therefore, already motivated. The 
teacher had had training in the techniques of group guidance 
and had had experience in group work. 


DATA ON A REPRESENTATIVE CLINIC GROUP 


It was at the beginning of the new semester in February, 
that the course in group guidance was introduced. It was 
planned for two half-hour periods per week, coming the last 
period on Tuesday and Thursday. The group was programmed 
to read history with the same teacher at the same time on 
Monday and Wednesday, thus maintaining contact. 

There follows a description of each student in the group: 


I 


Martin, who is sixteen years old, has been a tenth-grade 
student in a nearby high school where his grades have been D’s 
with an occasional C. Both the Principal and the Guidance 
Director of the school and also his mother are interested in 
Martin, who is preparing for a business career. His mother has 
migraine headaches, and frequently withdraws from her children, 
insisting on quiet in the home. The father is frequently away 
from home, and when at home is unsympathetic toward the 
children whom he feels are not a credit to him. 

Test results for Martin indicated that his intelligence quotient 
places him in the dull-normal range; his memory is reliable and 
even. He is immature emotionally and does little thinking for 
himself. He is high in acuteness of observation and ability 
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to handle concrete problems; low in practical judgment. He has 
satisfactory vision at near point. His dominance is completely 
right. In oral reading he is slow and inaccurate and in silent 
reading he places on grade level 5.4—thus evidencing a retarda- 
tion of five years. He reads simple material at 172 words per 
minute with good comprehension. The psychologist concluded 
that while Martin does not appear to have much capacity for 
educational tasks involving verbal problems, he appears to be 
able to do tasks involving design, mechanical drawing and other 
concrete skills. 


II 


Kevin, aged sixteen, is the son of a mother who is a college 
graduate and of a father who is a successful business executive. 
His sister is in college and is doing well. Kevin was very ill 
as an infant, but has had no serious illness since he was six months 
old. Kevin never learned to read. He repeated the first grade; 
has had his school situation changed frequently; has been 
tutored, and yet, he finally reached the eighth grade unable to 
read. He came to the Reading Laboratory in March, 1948, for 
training. 

The tests administered indicate that his intelligence quotient 
is in the average range. He is emotionally immature, very 
unsure of himself and gives up easily. His test of vision indi- 
cated some difficulty at near point but the ophthalmologist 
does not believe that a correction is indicated. His dominance 
is chiefly right, although he does some things with his left hand. 
Oral reading of very simple material is accomplished in a pain- 
fully slow manner with much stammering; in silent reading he 
placed on grade level 3.2 and his graphs show reading of simple 
material at 128 words per minute with moderate comprehension. 

Parental plans for Kevin are to have him enter high school for 
low-level students where he will take some commercial subjects 
and some trade training courses. Eventually, he will go into his 
father’s business, which is the selling of brick. 


III 


Charles is fifteen and has been failing in all subjects in the 
eighth grade. He repeated the seventh grade. His trouble with 
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reading began in the fourth grade. He has never been a behavior 
problem. His family find him a helpful member of the group 
but a reticent one. He has one brother, age seventeen, who is 
getting along fairly well in high school. 

Charles’ intelligence quotient is in the average range. He has 
a poor memory. Emotionally he is very immature and suffers 
greatly from feelings of inadequacy and insecurity. His test of 
vision shows trouble with lateral imbalance. His dominance is 
completely left, but he uses his right hand in golf. His oral 
reading is slow and punctuated with errors; his placement on a 
silent reading test is 5.7 and he reads 66 words per minute with 
good comprehension. 

Charles’ family would like to have him enter a vocational 
high school, but Charles has told his brother that he wants to 
go to a regular high school. 


IV 


Gilbert is sixteen. His mother is a college graduate; his father 
attended a university. His sister is in college at present. His 
brother, who is older, is a non-reader; he is successfully operating 
afarm. They have a good home life although there was much 
tension in the home during the depression years. Gilbert repeated 
the first grade and has since been pushed ahead until he reached 
the seventh grade without any significant accomplishment to his 
credit. However, he was always excellent in drawing. 

Gilbert’s intelligence quotient is in the dull normal range. 
He is a slow thinker and a slow learner. There are no indications 
of emotional disturbance. His vision is satisfactory at both near 
and far point. His dominance is completely right. In oral 
reading, he is very slow and inaccurate. On a silent reading 
test, he places at grade level 3.4 and he reads 100 words per 
minute with ninety per cent comprehension. 

Gilbert is interested in carpentry and masonry and does such 
work around his home and on his brother’s farm. He has good 
work habits; he is very reliable and systematic. He handles 
money well. His ability to draw is marked. The instructor 
from the Art Department of the University who has had this 
group of boys for sketching, says that Gilbert is very intelligent 
about following directions and that his achievement in this class 
is outstanding. 
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Vv 


John is sixteen and the member of a very cohesive family. 
His sister is in college and his twin brother is in high school. 
John progressed through the grades of a private school without 
significant accomplishment and came to the Reading Laboratory 
in January, 1948, practically a non-reader. 

Tests show his intelligence quotient in the average range. 
Marked emotional disturbance, insecurity and anxiety are 
present. He is aggressive, egocentric and somewhat hostile. 
His vision is satisfactory. In dominance, he is completely left. 
His oral reading is slow with many errors. At the time John 
became a member of the group guidance class, his silent reading 
was on a 5.2 grade level, his spelling on 4.7. He was reading 207 
words per minute with excellent comprehension. 

John has an aptitude for business and has done much work for 
his father, who is a building contractor. He understands money 
and the spirit of competition. He owns and drivesacar. He is 
getting training in the various building trades through the work 
he is doing; the training is evidently planned. He can read blue 
prints to some extent, but both he and his family expect him to 
take some technical training in the drawing of plans. John hasa 
pleasant manner but is deficient in a sense of social responsibility. 


( ) 


These five individuals are a typical reading laboratory group. 
They all need help in adjusting to the world of work as well as 
training in reading. Fundamentally, they have the same needs, 
but as individuals differ sufficiently so that they will benefit from 
the interplay of group thought and group discussion. 

The major purposes of this group guidance course were deter- 
mined as: 

1) To give to the student certain basic concepts concerning 
responsibilities of citizenship, acceptable social behavior, success 
in any occupation. 

2) To aid the student in obtaining adequate and accurate 
information about schools and courses so that he may have facts 
on which to base his decisions. 

3) To present to the student a method of investigating occupa- 


tions. 
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4) To assist the student in making tentative plans for the 
future. 

To achieve the first purpose, lessons were planned using Unit 
Number 1 of the Occupations Course by R. Floyd Cromwell and 
Morgan D. Parmenter, published by Guidance Publishing Com- 
pany, Limited, and distributed by the Psychological Corporation. 
Each student was provided with a text notebook and instructed 
in its use. 

The instructor developed worksheets according to the methods 
of the Reading Laboratory which has a definite philosophy under- 
lying its reading instruction. These worksheets were attached 
by each student to his text notebook, being placed following 
the chapter on which each was based. 

The lessons based on the Cromwell and Parmenter text note- 
book (twenty-four in all) were carried on through group dis- 
cussion, silent reading in class followed by tests of comprehension 
and oral reading. The problems selected for discussion and the 
cases analyzed by the group were chosen with the needs and 
interests of these particular boys in mind. 

The lessons on occupations were developed using material 
published by the Bureau of Vocational Guidance, Connecticut 
Department of Education. Worksheets to be used with each of 
ten job analyses written for junior high-school students were 
prepared by the instructor. These worksheets were designed to 
accord with the philosophy and method of the Reading Labora- 
tory and to point up the thinking of the student in regard to the 
important items to be considered when looking into any job. 
Each member of the group worked on his choice of three of the 
ten job analyses offered. 

The last three lessons were designed to give the students 
information on schools in the Bridgeport area. Bulletins and 
other descriptive material were read by the students. Letters 
of inquiry were written by students to the special schools which 
seemed to offer training suited to their interests and aptitudes. 
The last lesson placed emphasis on the importance of tentative 
vocational plans, stressing the need for flexibility in these plans 
and the advantage of broad training over training in specific 
skills at this stage in their education. 

Flashmeter work is part of the daily program of every student 
in our Reading Laboratory. It is used to quicken visual and 
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mental perception and, therefore, material selected for reading 
is familiar to the class in subject matter and is in accord with 
their interests. The reading of a paragraph is, as a rule, the 
final part of the lesson. A series of ten paragraphs were pre- 
pared by the teacher conducting the group guidance course and 
presented by the teacher who had this group of students for 
flashmeter work. This served as review and as a means of 
emphasizing certain aspects of the course. 

Good citizenship means voting, paying taxes, obeying the 
laws of the land; but more than that, it means a willingness to 
be informed and to study to some degree problems of government. 
To give these boys a concept of group relationships and of the 
responsibility of the individual to the group, together with a 
broad view of government functions, was necessary if the purpose 
of giving the student a basic concept of the responsibilities of 
citizenship was to be achieved. 

Inasmuch as time did not permit this to be worked into the 
thirty lessons specifically allotted to group guidance, these 
lessons on citizenship were substituted for a number of history 
lessons. The text selected was We Are The Government by Mary 
Elting and published by Doubleday, Doran and Company, Inc. 
This is an accurate and interesting account of the various depart- 
ments and branches of the Federal Government. It is written 
simply enough to be read by these boys, and yet it is sufficiently 
adult in format and presentation that they will not feel affronted. 
This is the text used for reading. Questions to test comprehen- 
sion are prepared by the teacher. 

The weekly assemblies of the student body at the Reading 
Laboratory provide occasions for these boys to present to the 
younger students their concepts of good citizenship and their 
broadening view of their country’s development. This widens 
the horizons of the younger students, motivates the members of 
the group guidance class and gives them practice in expressing 
their opinions in public. 

One device used in the Reading Laboratory to encourage 
independent reading, is a large file card on which the student 
keeps a record of the books he has read. A shelf of books from 
the library on subjects related to this course was placed in the 
group guidance classroom. Students were encouraged to select 
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books for independent reading from this shelf and to add it to 
their record of books read. They were invited to summarize 
and evaluate these books for other members of the group. 

Most group guidance courses plan some aptitude testing for 
the members of the course. Since this was in the tradition of 
such courses, three paper-and-pencil tests were administered by 
the instructor to this experimental group in the Reading Labora- 
tory. The tests administered were the Revised Minnesota 
Paper Form Board Test, the Bennett Test of Mechanical Com- 
prehension, and the O’Rourke Mechanical Aptitude Test, 
Junior Grade. Nothing definitive was indicated by the results 
of these tests. Arrangements have now been made to administer 
Performance Tests of Aptitude to these students through the 
Personnel Division of the University of Bridgeport. 


CONCLUSIONS 


Work with this group of adolescent boys during these three 
months suggests that a group guidance course should logically 
form a part of the program of the Reading Laboratory. 

1) Interest in learning about occupations and the qualities 
which contribute to success in the world of work, motivates 
reading and, therefore, contributes to the development of reading 
skills in the student. 

2) The material used is on an adult level and, yet, within the 
ability of these students to read under guidance; therefore, 
it fills a need constantly felt in the Reading Laboratory—that 
of providing material which takes into account the interests of 
students, which is presented in a manner and format suited to 
their age, and which is within the range of their reading ability. 

3) The work of this course stimulates thinking along adult 
lines and, thus, contributes to the maturing process of these 
boys, a need usually revealed in the initial testing. 

4) The lessons on citizenship fill a definite need in that they 
contribute to the maturing process of the student, give him back- 
ground material which he missed in his regular schooling because 
of deficient reading, and stimulates his thinking and reading 
along broader lines. 

5) Performance tests rather than paper-and-pencil tests 
should be given to determine the individual aptitudes of the 
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members of such a group. This should be done by experts who 
are working regularly in this field. 

6) Through this group guidance course the student is helped 
to choose a tentative goal and to make plans for training upon 
finishing his work at the Reading Laboratory, thus making it 
more likely that the skills and work habits acquired in the Labora- 
tory will be maintained and developed. 











RELIABILITY AND VALIDITY 
OF INVOLUNTARY BLINKING AS A MEASURE 
OF EASE OF SEEING* 


MILES A. TINKER 


University of Minnesota 


In any measuring device, the problems of reliability and valid- 
ity are of high importance. The reliability of any measuring 
instrument is the consistency with which it measures the kind 
of response or behavior involved. Validity of a technique 
depends upon the fidelity with which it measures whatever it 
purports to measure. Only when adequate reliability and 
validity of an instrument, test or technique have been demon- 
strated, can one place confidence in results obtained by that 
measuring device. 

Rate of involuntary blinking has been extensively used by 
Luckiesh” and Luckiesh and Moss" as a criterion of ease of seeing 
and of readability of print. In fact data obtained by this tech- 
nique (rate of blinking) form much of the bases for conclusions 
and recommendations concerning adequate illumination intensi- 
ties and desired typographical arrangements. Luckiesh, Guth 
and Eastman?? point out that researches by Luckiesh and by his 
co-workers employing the blink technique over a period of years 
have yielded useful data in the study of level of illumination and 
of typography. In another place Luckiesh® states that rate of 
involuntary blinking while reading is a very sensitive criterion of 
ease of seeing. He also claims in this report that rate of blinking 
while performing the same task under different conditions (as 
changes in level of illumination) is by far the most sensitive, sig- 
nificant and practical criterion of ease of seeing that has been 
used. 

Since rate of blinking is considered such an important criterion 
of ease of seeing, it is of considerable importance to evaluate the 
reliability and validity of the technique. It might be noted 
further that, in using this technique, the certainty of discovered 
differences should be evaluated statistically. The purpose of the 
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present study is to present and codrdinate the results obtained 
in a number of my studies on reliability and validity of the blink 
technique as a measure of ease of seeing. 


RELIABILITY 


In an initial study”! the blink rates were recorded for each 
successive five-minute period during thirty minutes of reading 
for two groups: seventy-four adult readers and sixty-four adult 
readers. Conditions of the experiment were comparable to 
those specified by Luckiesh, Guth and Eastman.'* Blink rates 
in both sets of data tended to increase slightly in the successive 
five-minute periods of reading. The blink-rate consistency or 
reliability coefficient was fairly high from one five-minute period 
to the next period, i.e., mean about .87. The trends were about 
the same for adjacent ten-minute periods of reading (.85). From 
initial to final five minutes (of the thirty minutes), the reliability 
dropped to about .50. For the ten-minute periods of reading 
at the beginning and end of the thirty-minute period, the reliabil- 
ity was about .65. For the first 15 minutes vs. the last 15 
minutes the reliability coefficient was .75. 

In a second investigation,”? frequency of blinking was recorded 
while reading lower-case text and all capital text for ten minutes 
each during each of two experimental sessions. There were 
sixty subjects. Records were kept for each five minutes of the 
ten-minute periods. On comparing the reliability of the blink 
frequency from one session to the next, the coefficients were 
uniformly high: .86 for five-minute periods; .89 for ten-minute 
periods; and .94 for twenty-minute periods. 

A third study** was concerned with blink rate in reading book 
type and newsprint. ‘There were sixty subjects. Records were 
kept for each five minutes of the ten-minute reading periods. 
For the five-minute periods the reliabilities were .82 and .84; for 
ten-minutes it was .95. 

In a final study,?* forty-two subjects read for fifty-five minutes 
under two footcandles of light and for a like period under one 
hundred footcandles. Blink frequency was recorded for the 
initial and final five minutes of each period. For reading under 
two footcandles the reliability (first vs. final five minutes) was 
.87; for one hundred footcandles, .57. 

When data in all four studies are considered, trends in reliability 
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coefficients are as follows: For successive five-minute or ten- 
minute periods of reading, and for five-minute or ten-minute 
periods of reading at two separate experimental sessions, the 
coefficients are relatively high—around .85 to .95. This repre- 
sents highly satisfactory consistency of performance. When 
initial and final five minutes of thirty to fifty-five minutes of 
reading are compared, the reliabilities fluctuate around .50. 
But in one case out of four a coefficient similarly derived was .87. 
Ten minutes of reading yields slightly higher reliability than five 
minutes. It may be concluded that consistency of performance 
in measuring blink frequency during reading is adequate where 
group comparisons are involved although these coefficients at 
times barely reach the minimum requirements when initial and 
final five minutes of a lengthy period of reading are compared. 
For any short period of five to fifteen minutes of reading, the con- 
sistency or reliability of blink frequency is highly satisfactory. 

In most other studies of blink frequency, reliability of perform- 
ance has not been cited. In Carmichael and Dearborn’s investi- 
gation,® groups of ten subjects each read on different occasions 
various kinds of material for six hours. Blinks were recorded 
for the first five minutes of reading and for the last five minutes 
of the successive twelve half-hour intervals. Correlations were 
computed between initial scores and the scores at the end of each 
successive half hour. These (reliability) correlations reveal how 
consistently the subjects maintained their original positions in 
the group. Although these coefficients revealed wide variabil- 
ity, ranging from .06 to .99, four-fifths of them were .70 or above. 
Very few were below .50. In most of their samples, therefore, 
the reliability of the blink technique seems adequate. 


VALIDITY 


In various reports,® !!!)!2.13.14 Luckiesh and his coworkers have 
employed and supported rate of involuntary blinking as an 
adequate measure of readability of print, of ease of seeing, visual 
efficiency, ocular fatigue, etc. The assumption is that rate of 
involuntary blinking increases with the severity or difficulty of 
the task and thus reflects the degree of effort expended by the 
subject in performing the visual task. The validity of a measure 
should be established in terms of how well it agrees with a cri- 
terion. Andacriterion is ordinarily considered to be some measure 
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which experience has shown to be an adequate or true measure 
of the trait under consideration. 

Three studies have been completed in our laboratory to check 
the validity of blink rate as a measure of ease of seeing. In the 
first investigation,”? the relative readability of text in all-capital 
printing was compared with text in lower-case. Earlier studies 
have established that all-capital text is significantly less readable 
than text in lower-case. Tinker and Paterson” demonstrated 
that all-capital text is read about twelve per cent slower than 
lower-case text. The same trend was discovered in an eye- 
movement study.” Measuring the trend in blink rate in a 
similar comparison will, therefore, furnish a check on blink rate 
as a measure of readability. 

The experiment was carried out in a light laboratory with 
ten footcandles of well diffused illumination. Sixty university 
students served as readers for two experimental sessions, reading 
lower-case text followed by all-capital text at one session and in 
the reverse order at the second session. A counter-balanced 
experimental design was used and the subjects were adapted to 
the illumination prior to the reading. Comparisons were made 
for five, ten and twenty minutes of reading. Both number of 
blinks per period and rate of reading were measured. The find- 
ings revealed no significant differences in rate of blinking while 
reading text in all-capitals in comparison with lower case. But in 
all comparisons, the text in all-capitals was read significantly 
slower than text in lower-case, i.e., 9.53 to 19.01 per cent slower. 
The findings suggest that frequency of blinking is an unsatis- 
factory criterion of readability of print. 

A second study‘ was concerned with the readability of book 
print and newsprint in terms of blink rate. In earlier work, 
Paterson and Tinker’* have shown that newsprint is read sig- 
nificantly slower than book print. All evidence'*” indicates 
that ease of seeing large (book) type is greater than for small 
(newsprint) type. A comparison of blink rates for these mate- 
rials, therefore, should provide a check on the validity of blink 
rate as a measure of ease of seeing. In the present study the 
book type was twelve-point; the newsprint, seven-point. Speci- 
fications for experimental design and procedures laid down by 
Luckiesh were followed. For details see the original report.‘ 
The results show significantly fewer blinks (61.27 vs. 69.37) for 
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reading the newsprint. This trend is directly opposite to what 
should have occurred if rate of blinking is an adequate measure 
of ease of seeing, i.e., there should have been more blinks in 
reading the newsprint. Obviously, further checking of the 
validity of the blink technique is needed. 

In the third study*® concerned with influence of illumination 
intensity, the specifications of Luckiesh, Guth and Eastman!? 
and Luckiesh and Moss" concerning experimental design and 
procedures were followed in detail and the same reading material 
was used. Forty-two university students read for fifty-five 
minutes under two and under one hundred footcandles of light. 
Frequency of blinking during the first five and the last five 
minutes of the fifty-five-minute period were recorded. Details 
of conditions and procedures are given in the original report. 
The problem in this experiment is to check whether frequency of 
blinking increases significantly more for reading fifty-five minutes 
under the dim than under the bright light. 


TABLE I.—CHANGES IN RATE OF BLINKING WHEN READING 
Firry-FIVE MINUTES UNDER TWO AND UNDER ONE HUNDRED 
FooTcANDLES OF Licut. N = 42 READERS. 














First 5 Min. | Last 5 Min. 
Foot- r_ | Diff. 7o 2 
Candles Mean! SD |Mean! SD Diff. | ¢ Diff. 
2 25.8 | 23.2 | 33.9 | 27.9 | .87 | 8.1 | 31.6 | 5.60 
100 26.8 | 19.4 | 35.4 | 24.5 | .57 | 8.6 | 32.0 | 3.97 





























The basic data are given in Table I. Examination of the 
means reveals that there is an appreciable increase in number 
of blinks from initial to final five minutes of reading under both 
levels of illumination. These changes in blink rate are significant 
at the one per cent level as shown by the critical ratios in the 
last column of the table. The same level of significance is 
obtained when the t-test is applied to the data. 

Under two footcandles, the increase in blink rate is 8.1 or 31.6 
percent. Similarly, under one hundred footcandles, the increase 
is 8.6 or 32.0 per cent. There is, therefore, an approximately 
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identical amount of increase in blink rate for reading about an 
hour under two and under one hundred footcandles. 

It is universally recognized, in terms of literature on the sub- 
ject, that reading under two footcandles is a more severe visual 
task than reading under one hundred footcandles. There is no 
trend in the blink data of this experiment to support this view. 
Under the conditions of this experiment, therefore, rate of invol- 
untary blinking is not a valid criterion of ease of seeing in reading. 


DISCUSSION 


Reliability or consistency of performance in rate of involuntary 
blinking appears adequate. In those studies where the reliabil- 
ity has been computed, the coefficients tend to run fairly high 
for most samples. Nevertheless, since low reliability occurs 
occasionally, the reliability should be computed in each investiga- 
tion where the blink technique is used. 

Validity of the blink technique as a measure of ease of seeing, 
however, is in question. In Luckiesh and Moss" and Luckiesh, 
the findings reported uniformly show increases in rate of blinking 
under conditions which presumably involve more difficult visual 
tasks such as with reduction in size of type, intensity of illumina- 
tion, uniformity of lighting, etc. They conclude that blink rate 
as a criterion of ease of seeing is uniquely satisfactory. But 
check studies have failed to obtain data confirming the trends 
published by these authors. 

McFarland, Holway and Hurvich,'* in view of their findings 
and analyses, conclude that a high blink rate need mean neither 
an increase in visual fatigue nor an increase in difficulty of seeing. 
In his study, Bitterman! found a decrease in work output for 
reading six-point in comparison with ten-point type but no sig- 
nificant difference in blink rate for reading the two sizes of type. 
In the same study he found no significant difference in blink rate 
for reading under three and under ninety-one footcandles of light. 
In both comparisons the slight but insignificant differences were 
in favor of the more severe visual task which is contrary to the 
trends found by Luckiesh and his co-workers. Bitterman con- 
cludes that rate of blinking cannot be accepted as a criterion of 
readability and ease of seeing. In a later analysis Bitterman 
and Soloway‘ demonstrated that expenditure of greater effort in 
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mental work is not necessarily reflected in an increased frequency 
of blinking. 

Simonson and Brozek,’” in studying the effects of illumination 
level on visual performance and fatigue, employed the severe task 
of visual discrimination in perceiving very small letters under 
two, five, fifteen, fifty, one hundred and three hundred foot- 
candles of illumination. Work output increased significantly in 
going from two, five or fifteen footcandles to the higher levels of 
illumination but there were no significant changes in blink rate. 
They conclude that the blink technique cannot be employed as a 
fatigue index. In a later study on the effect of spectral quality 
of light on visual performance and fatigue, these same authors” 
found that blinking rate did not show significant changes between 
illuminants although there were significant differences in certain 
performance and fatigue tests. In Carmichael and Dearborn’s 
experiment® samples of blinking were taken at half-hour intervals 
during six hours of normal reading and microfilm reading. There 
were more blinks for normal reading than for the microfilm 
reading although most of the differences were not significant. 
Conclusions are not clear here for adults read microfilm faster 
and high-school subjects were more rapid in the normal reading. 
In studying the influence of typographical variations on reading 
by visual handicapped children, McNally'® found no significant 
differences in frequency of blinking from one kind of text to 
another. Rate of reading revealed no significant differences 
either. Hence the whole set of data seems equivocal. 

When all studies on the blink technique as a measure of ease 
of seeing are examined the following trends are revealed: (A) 
Luckiesh and his co-workers always report data that appear to 
indicate a higher rate of blinking for situations where the visual 
task is more exacting. (B) In all other investigations either (a) 
there are no significant differences in frequency of blinking in 
going from easy to severe visual tasks or (b) the trends of the 
data are equivocal. (C) There are, therefore, serious doubts 
concerning the validity of the blink technique as a measure of 
ease of seeing. 

In reply to criticisms’? of the blink technique, Luckiesh!? 
has argued that workers in other laboratories have not duplicated 
his experimental set-up and methods of procedure, nor maintained 
the same reading attitudes in the subjects. Nevertheless, inabil- 
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ity to confirm Luckiesh’s results cannot be due to these factors 
for Tinker’s recent experiment?* duplicated all conditions and 
arrangements specified by Luckiesh. The data revealed no 
significant differences in blink rate for reading under two vs. 
one hundred footcandles of light. 

Luckiesh" and Luckiesh, Guth and Eastman! imply that the 
increase in frequency of blinking from the beginning to the end 
of a lengthy period of reading supports the view that the tech- 
nique is valid as a measure of ease of seeing. They point out 
that when conditions are comparable to those in their experi- 
ments, findings are similar. The following data were cited (per 
cent increase in blink rate is given in parentheses at end of items): 


Luckiesh and Moss (N = 11), 60 min. of reading....... (31) 
Hoffman (N = 30), 60 min. of reading................. (27) 
Tinker (N = 74), 30 min. of reading.................. (25 .2) 
Tinker (N = 64), 30 min. of reading.................. (36 .2) 


It is obvious that the trends toward increase in frequency of 
blinking from beginning to end of a period of reading agree. It 
is doubtful, however, that this trend implies anything concerning 
the validity of the blink technique as a measure of ease of seeing. 
This trend toward an increase in blink rate is found in practically 
every experiment. For instance, Tinker®® found an increase of 
31.6 and 32 per cent while reading fifty-five minutes under two 
and one hundred footcandles, respectively. And Carmichael and 
Dearborn’ found increases of about twenty-one, twenty, thirty- 
nine, and thirteen per cent for reading from two to six hours 
under sixteen footcandles. Many of these differences are statis- 
tically significant. Hoffman’*® has suggested that the change 
(presumably of a deleterious nature) in blink rate with continu- 
ous reading is the effect of a work factor (work decrement phe- 
nomenon). This seems a reasonable inference. Apparently 
Carpenter,® in his study of changes in rate of blinking during 
visual search, has interpreted this work decrement phenomenon 
as validating the technique as a criterion of ease of seeing. His 
increase in blinks from the first to the last half hour of a two- 
hour period was forty-three per cent. This is somewhat greater 
than the increases (about thirty-two per cent) found for one to 
two hours of reading. 

In any case, if differential increase in frequency of blinking due 
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to visual work is to be employed as a criterion of ease of seeing, 
the increase during a severe visual task should be greater than 
during an easy task by a statistically significant amount. No 
independent worker, however, has been able to confirm the results 
reported by Luckiesh and his co-workers when such comparisons 
have been made. 

A word may be added concerning the statistical treatment of 
blink-rate data. Tinker?* has questioned the appropriateness of 
the geometric mean as employed with these data by Luckiesh 
and his co-workers. The same criticism is offered by Hoffman. 
He also severely criticizes the use of the percentage technique 
employed by them, and for basing conclusions on percentage 
differences rather than on raw score differences. Percentage 
differences are notoriously unreliable. Furthermore, if the raw 
scores are below 100 (as most of them are), percentages magnify 
the differences. Thus insignificant raw score differences may 
seem large when put into percentage differences. Furthermore, 
all obtained differences should be evaluated for statistical 
significance. 

The trend of the data reported here plus a survey of related 
experiments raises serious questions concerning the validity of the 
blink technique as a measure of ease of seeing and readability 
of print. On the one hand are the data reported by Luckiesh 
and his co-workers. They!? conclude that frequency of blinking 
‘‘while reading is a very sensitive criterion of ease of seeing and 
that the results are significantly related to tenseness and other 
results of reading and other critical seeing.’”’ On the other hand, 
no independent worker, even when duplicating the experimental 
design of Luckiesh, has been able to confirm the results of 
Luckiesh and his co-workers. These independent workers all 
find that increasing the severity of the visual task has no differ- 
ential effect upon frequency of blinking. Although the fre- 
quency of blinking does increase as reading is continued for 
thirty minutes to an hour or more, the rate of increase is no 
greater for severe than for easy visual tasks. Lack of confirma- 
tion of the Luckiesh findings can no longer be assigned to differ- 
ences in experimental design.*® The basis for the apparent 
consistency in the data from Luckiesh’s laboratory and which 
cannot be confirmed by others is not clear at this time. 

In any case, failure to substantiate the rate of blinking as a 
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criterion of ease of seeing and of readability must lead to its 
rejection as an adequate technique in studies of visual work. 
Bitterman and Soloway® tend to agree with these conclusions. 
This would mean that much of the evidence put forward by 
Luckiesh” and Luckiesh and Moss!‘ to support recommendations 
for desirable practice in the field of illumination and typograph- 
ical arrangements is invalid and that the recommendations, there- 
fore, are without sound foundation. 

Various techniques, including frequency of blinking, have been 
promoted as measures of efficiency in terms of the physiological 
cost of the work. None of these have been satisfactory. It is 
generally admitted, however, that an adequate measure of energy 
expenditure during visual work would yield important data to 
supplement work-output results. In the absence of the former 
it is feasible? to rely upon data from performance measures until 
more sensitive and valid indices of cost have been developed. 


SUMMARY AND CONCLUSIONS 


1) The results of four experiments concerned with reliability 
and validity of frequency of blinking as a measure of ease of 
seeing are reported. 

2) Consistency of response or reliability of blink rate is ade- 
quate as used in experiments on readability and ease of seeing. 

3) Frequency of blinking is not a valid measure of readability 


and ease of seeing. 
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THE ORIGIN AND DEVELOPMENT OF THE 
SPANISH ATTITUDE TOWARD THE ANGLO 
AND THE ANGLO ATTITUDE TOWARD 
THE SPANISH* 


GRANVILLE B. JOHNSON, JR. 
Arizona State College at Flagstaff 


The problems derived from intergroup attitudes are varied and 
their solution is particularly important at the present time from 
the standpoint of national and international maladjustment. 
Research which probes the origins and traces the development 
of racial attitudes at the molar level would make a positive con- 
tribution to developmental and social psychology. 

Underlying the present investigation is the explicit assumption 
that any measurable differences of personality between races or 
ethnic groups are due to an environmental and not a hereditary 
function. We are in agreement with the conclusions of Boaz: 

“‘The occurrence of hereditary mental traits that belong to a 
particular race has never been proven. The available evidence 
makes it much more likely that the same mental traits appear 
in varying distribution among the principal racial groups. The 
behavior of an individual is therefore determined not by his 
racial affiliation, but by the character of his ancestry and his 
cultural environment. We may judge the mental characteristics 
of families and individuals, but not of race.’”! 


THE PROBLEM 


The purpose of the present investigation was to ascertain the 
degree of prejudice in progressively mature age groups of Spanish 
and Anglo subjects utilizing the Projective Test of Racial Atti- 
tudes and to reveal something of the origin and development of 
racial attitudes. 





* This is the second in a series of three articles concerned with the origin 
and development of racial attitude. The previous paper, ‘An Experi- 
mental Projective Technique for the Analysis of Racial Attitude,” appeared 
in April issue of this JourRNaL (41:4, p. xxx). In this early article the 
properties of the Projective Test of Racial Attitude, the major tool in the 
present investigation, were analyzed. 
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MATERIAL AND SUBJECTS 


Materials.—1) The Projective Test of Racial Attitudes com- 
posed of six carefully selected pictures relevant to race situations 
and provocative of projected response was utilized for the analysis 
of racial attitudes. There was a hero figure with whom the 
subject identified himself and thus projected something of the 
dynamics of his personality into the depicted situations. 

The situations conducive to projected response were identical 
except for the ethnic group variable. For Anglo subjects there 
were six cards of total Anglo content and six with an Anglo hero 
in a Spanish situation. For Spanish subjects the situations 
were reversed with a Spanish hero in an Anglo situation in six 
cards with identical cards without the Anglo element. 

2) A tabulation sheet was utilized for recording the dynamics 
of responses. 

Subjects.—1) Anglo: thirty male subjects (four-, eight- and 
twelve-year levels, N = 90) from public schools in a south- 
western town. 

2) Spanish: thirty bilingual male subjects (four-, eight- and 
twelve-year levels, N = 90) from public schools in a south- 
western town. 

A further limitation was that members of both groups were 
required to have lived in this specific town for the year prior to 
testing. 

DEVELOPMENTAL CURVES 

The first phase of the analysis was to trace the development 
of the Spanish attitude toward the Anglo and the Anglo attitude 
toward the Spanish. This was accomplished by computing the 
total number of responses in the Spanish-Spanish (Anglo-Anglo) 
columns and the Spanish-Anglo columns at each age level, then 
finding the significance of the difference between them. The 
change in the relative number of responses and resultant change 
in the significance of the difference was noted with increase in 
age. 
Spanish.—Table I presents the significance of the difference 
between same group and not same group of ninety Spanish 
subjects, four, eight, and twelve years of age on the twelve cards 
of the Projective Test of Racial Attitudes. The difference 
indicated greater reaction in the Spanish-Anglo situations. 
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The analysis included: Effect of Environment (Frustrating); 
Reaction to Environment, (Frustration); Adequacy of Principle 
Character, (Superior); Ending Hero, (Defeat); and Ending 
Theme, (Unsatisfactory). 


TaBLE ].—SIGNIFICANCE OF DIFFERENCE BETWEEN SPANISH- 
SPANISH AND SPANISH-ANGLO Responses Usina THIRTY 
SPANISH SUBJECTS AT THE FourR-, EIGHT- aND TWELVE- 
Year LEVELS IN THE PROJECTIVE TEST OF Ractau 


ATTITUDES* 
Eff. of _Reac. to Ending 
Env. Env. Adeq. of Ending Theme 
Age Frus- Frus- ‘Pr. Char. Hero Unsatis- 
Levels trating tration Superior Defeat factory 
12 4.05 5.57 5.00 2.49 4.86 
8 3.97 5.33 4.38 2.54 4.42 
4 .58 .68 81 2.74 91 


* Difference indicates greater reaction in Spanish-Anglo situations. 


With reference to Table I it is quite evident that what happens 
in the temporal interval between the fourth and eighth years 
contributes much more to development of racial attitudes than 
the subsequent years studied. It is observed, too, that the 
organism’s tendency to react to the environment in a manner 
betraying frustration manifests the greatest change in the first 
four years. It is important to note that this category is one 
of the most highly related with the total prejudice score and 
therefore is one of the most indicative of prejudice. 

In both the Spanish and Anglo groups it is seen that the Spanish 
hero is defeated in the Spanish-Anglo situations as frequently 
at the four-year level as at the two older age levels. At all age 
levels the hero or character with whom the subject identified 
himself is defeated more often in the not same group than he is in 
the same group. It is possible to present the hypothetical 
explanation with reference to this finding that, though the result 
is the same, the cause may arise from different sources. With 
reference to this finding, defeat of the principle character by a 
member of the other ethnic group may be considered a criterion 
of prejudice. 
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Table II presents the significance of the difference between 
same group and not same group of total Spanish subjects at the 
chosen age levels. The analysis includes: Effect of Environment, 
(Neutral); Adequacy of Principal Character (Equal); Ending 
Hero, (Indeterminate); Ending Theme, (Indeterminate). The 
difference indicates greater reaction in the Spanish-Spanish 
situations. 


TaBLE II].—SIGNIFICANCE OF DIFFERENCE BETWEEN SPANISH- 
SPANISH AND SPANISH-ANGLO RESPONSES UsING THIRTY 
SPANISH SUBJECTS IN Eaco AGE Group NEUTRALITY 


CaTEGORIES* 
[Eff. of  Reac. to Ending Ending 
Age Env. Env. Hero Theme 
Levels Neutral Equal Indeterminate Indeterminate 
12 —3.21 9.61 7.79 1.51 
8 3.31 9.03 7.12 1.18 
4 .23 2.19 1.52 0 


* Difference indicates greater reaction in Spanish-Spanish situations. 


A neutral, equal, or indeterminate response displays absence 
of affect to the specific situation. Though these particular 
categories do not reveal the direction of prejudice swing, they 
do demonstrate positive or negative reaction to the depicted 
situation. In Table II it is noted that the shift from neutrality 
in the Spanish-Spanish categories to positive or negative reaction 
in the Spanish-Anglo categories has a marked relationship with 
age. 
Reflecting the organism’s tendency to be affected by the 
Spanish-Anglo environment in either a positive or negative 
manner, the neutral column shows the greatest difference between 
Spanish-Spanish and Spanish-Anglo categories. This factor 
would possibly indicate that discrimination of and sensitivity to 
different aspects of the environment are prerequisite to more 
complex prejudice manifestations. It has been found that there 
is a positive relationship between prejudice and ability to recog- 
nize stereotypes. ' 

In ‘Ending Theme,’ (Spanish-Spanish), because no indeter- 
minate responses were elicited at the four-year level (though 
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there were six at the Spanish-Anglo level) no critical ratio could 
be ascertained. 

The critical ratios in Table III represent the significance of the 
difference between Spanish-Spanish and Spanish-Anglo responses 
for the Spanish age levels. The difference indicates greater 
reaction in the Spanish-Spanish categories in the ‘Effect of 
Environment,’ ‘Reaction to Environment,’ and ‘Ending Theme.’ 
It indicates greater reaction in the Spanish-Anglo categories in 
the ‘Adequacy of the Principle Character,’ and ‘Ending Hero.’ 
The analysis included: Effect of Environment, (Helpful), Reac- 
tion to Environment, (Satisfactory); Adequacy of Principle 
Character, (Subordinate); Ending Hero, (Victory); Ending 
Theme, (Satisfactory). 


TaBLE III.—SIGNIFICANCE OF DIFFERENCE BETWEEN SPANISH- 
SPANISH AND SPANISH-ANGLO ReEsPONSES UsiInG THIRTY 
SPANISH SUBJECTS IN Eacu AGE GROUP IN THE PROJEC- 
TIVE Test oF RacraL ATTITUDES 


Reac. to Adeq. of Ending 

Eff. of Env. Pr. Char. Ending Theme 

Age Env. Satis- Subor- Hero Satis- 
Levels Helpful* factory* dinatef Victoryf factory* 
12 —1.89 —6.06 3.75 5.56 —4.82 

8 —1.47 —5.33 3.59 4.25 —4.65 

4 — .71 — .68 2.58 .16 —1.67 


* Difference indicates greater reaction in Spanish-Spanish situations. 
+ Difference indicates greater reaction in Spanish-Anglo situations. 


With reference to the data in Table III, it is noted that only 
Ending Hero, (Victory), and Adequacy of Principle Character 
(Subordinate), have more positive reaction in the Spanish-Anglo 
situation than in the Spanish-Spanish. This means, then, that 
both ascendance and submission become more manifest with age 
and may be considered criteria of prejudice. Though subordina- 
tion of the hero in the Spanish-Anglo situation does develop 
with age, the difference is already statistically significant at the 
four-year level. Evidently, this feature is one of the first to 
develop with ego evolvement and group identification. 

As age increases, it is noted, the reaction to the Spanish-Anglo 
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environment becomes less satisfactory as does the feeling that 
elements of this environment are helpful. The ending of the 
theme also ends less optimistically. 

Anglo.—The previous data demonstrated the phases and 
characteristics of attitudinal development of the Spanish sub- 
jects. The following analysis illustrates how and in what 
different age levels the Anglo attitude develops in adjustment 
to the Spanish. 

Table IV presents the significance of the difference existing 
between Anglo-Anglo reaction and Spanish-Anglo reaction of the 
total Anglo group. This table includes: Effect of the Environ- 
ment, (Frustrating); Reaction to Environment, (Frustration) ; 
Adequacy of Principle Character, (Superior); Ending Hero, 
(Defeat); and Ending Theme, (Unsatisfactory). 


TaBLE IV.—SIGNIFICANCE OF DIFFERENCE BETWEEN ANGLO- 
ANGLO AND SPANISH-ANGLO RESPONSES UsING THIRTY 
ANGLO SuBJEcTS IN EacoH AGE GROUP ON THE PROJEC- 
TIVE Test oF RactaL ATTITUDES* 


Eff. of = Reac. to Ending 
Env. Env. Adeq. of Ending Theme 
Age Frus- Frus- ‘Pr. Char. Hero Unsatis- 
Levels trating tration Superior Defeat factory 
12 5.27 6.16 7.95 2.32 6.34 
8 1.85 3.05 3.37 2.10 5.63 
4 1.84 2.07 2.03 1.58 4.78 


* Difference indicates greater reaction in Spanish-Anglo situations. 


It is noted, with reference to the foregoing table: 

1) The four-year Anglo is more biased in his attitude toward 
the Spanish than is the Spanish of the same age group toward the 
Anglo. It is apparent that the Anglo’s concept of the Spanish 
becomes negative earlier in life than the Spanish’s toward the 
Anglo. This is concluded though it is realized that the average 
Anglo child of four years has had less intercourse with Spanish 
than the average Spanish of that age has had with Anglos. 
Evidently, the exaggerated attitude on the part of the Anglo 
with little experience concerning the Spanish has arisen from 
some source other than direct contact. 
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2) Though the four-year Anglo was more prejudiced in his 
attitude toward the Spanish than the Spanish at that age level 
was toward the Anglo, little development between the fourth 
year and the eighth year was observed in Anglo attitude. This 
finding was in counterdistinction to the Spanish developmental 
curves which noted their greatest increment during this period. 

3) The Anglo critical ratios between same and not same situa- 
tions was greater at the twelve-year level than the Spanish 
(though not significantly so—C.R. = 1.211, significant at .20 


level). 
Further analysis was possible only after all categories had been 


analyzed. 

Table V represents differences in reaction to Anglo-Anglo and 
Spanish-Anglo stimuli with respect to neutral and equal cate- 
gories. As previously pointed out, equality and neutrality are 
manifestations of lack of emotional arousal and bias. 


TaBLE V.—SIGNIFICANCE OF DIFFERENCE BETWEEN ANGLO- 
ANGLO AND SPANISH-ANGLO Responses Usinc THIRTY 
ANGLO SUBJECTS IN Each AGE GROUP ON THE PROJEC- 
TIVE Test oF Racrat ATTITUDES* 


Eff. of | Adeq. of Ending Ending 
Age Env. Pr. Char. Hero Theme 
Levels Neutral Equal Indeterminate Indeterminate 
12 3.32 10.87 9.53 1.54 
8 .79 5.60 5.09 0 
4 1.55 2.68 2.88 .38 


* Difference indicates greater reaction in Anglo-Anglo situations. 


It may be remarked from the foregoing data that neutral 
responses elicited from the Anglo group to the Anglo-Anglo 
situations became progressively weighted in comparison with the 
Spanish-Anglo situation in the older age levels. This is indica- 
tive, as previously suggested, of the increased emotional arousal 
of the older subjects to the Spanish-Anglo situations depicted. 

It is noted that the adequacy of the hero figure, closely par- 
alleled by the ultimate outcome of the hero, is most precipitous 
in loss of neutral reaction. 

Ending Theme, (Indeterminate) is incomplete at the eight-year 
level because no response was observed in the Anglo-Anglo cate- 
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gory though there were seven at the Spanish-Anglo level. In 
this situation the satisfactory ending (Spanish-Anglo) may be 
less weighted with emotional factors than the indeterminate 
category. 

Table VI presents the significance of the difference manifested 
by the Anglo subjects between response to the Anglo-Anglo and 
response to the Spanish-Anglo stimuli. The categories under 
consideration are: Effect of Environment, (Helpful); Reaction to 
Environment, (Satisfactory); Adequacy of Principle Character, 
(Subordinate); Ending Hero, (Victory); and Ending Theme, 
(Satisfactory). The difference indicates greater reaction in 
Anglo-Anglo categories ‘Effect of Environment,’ ‘Reaction to 
Environment,’ and ‘Ending Theme.’ It indicates greater reac- 
tion in Spanish-Anglo categories in ‘Adequacy of Principle 
Character’ and ‘Ending Hero.’ 


TaBLE VI.—SIGNIFICANCE OF DIFFERENCE BETWEEN ANGLO- 
ANGLO AND SPANISH-ANGLO Responses UsinGc THIRTY 
ANGLO SUBJECTS IN Eaco AGE GROUP ON THE PROJEC- 
TIVE Test oF RactaL ATTITUDES 


Reac. to Adeq. of Ending 
Eff. of Env. Pr. Char. Ending Theme 
Age Env. Satis- Subor- Hero Satis- 
Levels Helpful* factory* dinatef Victoryt factory* 
12 3.33 6.16 2.87 7.99 6.64 
8 .019 3.04 2.91 3.77 6.42 
4 .005 2.07 01 .02 4.99 


* The difference indicates a greater reaction in Anglo-Anglo situations. 
t The difference indicates a greater reaction in Spanish-Anglo situations. 


It is observed from the data in Table VI that victory and sub- 
ordination are more prevalent in the Anglo’s reaction to Spanish- 
Anglo situations than to Anglo situations alone. This finding 
parallels that derived from the Spanish data and the conclusions 
from these data, that ascendance and submission are both mani- 
festations of prejudice, apply to both ethnic groups. 


OLDER COMPARED WITH YOUNGER IN EACH AGE GROUP 


It is evident from the foregoing analysis that the formation of 
attitude does not progress at a uniform rate. The Spanish 
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between four and eight years of age and the Anglos between 
eight and twelve manifest a marked increase in development over 
that found in other age levels. Evidently, too, there is a period 
of acceleration particularly at the Anglo level before the four-year 
level. 

The irregularities of growth manifested in the developmental 
curves demonstrate the impossibility of utilizing extrapolation for 
prediction, though interpolation is necessarily and legitimately 
used. The only technique which could be employed to ascertain 
attitudinal development below the four-year level was to com- 
pare the younger members with the older members in each age 
group. This was significant only at the four-year level, but the 
other levels were analyzed, too, to note fluctuation which would 
aid in interpreting the findings at the early age levels. 

The technique employed was to dichotomize each age level of 
each ethnic group by using as samples the twelve older and the 
twelve younger subjects. It was realized that the small sample 
would preclude accentuated fluctuations. 

Because of the small sample, inconsistencies were manifest, 
but it was noted, however inconclusively, that the Spanish and 
Anglos at the three-and-one-half age level rather consistently 
elicited less difference between reaction to same group and not 
same group stimuli. This would indicate that, in one year, a 
measurable increment in attitudinal development is manifest 
in both ethnic groups. The data suggest that the origin of 
Spanish prejudice appears to be very close to the three-and- 
one-half age level while the genesis of the Anglo appears some- 


what before this time. 


SUBJECTS WITH SPANISH PARENT AND ANGLO PARENT 


An interesting indication of accentuated prejudice was mani- 
fested upon analysis of six subjects who had one Spanish and one 
Anglo parent. Though the number was small, it was found that 
the subject’s frustration and concomitant aggression in the 
situations depicted were higher than the total for the specific 
age group. The general prejudice score, too, was higher toward 
one ethnic group or the other. An attempt was made to ascer- 
tain the factors contributory to swing direction. 

At first it was hypothesized that the subject adopted the atti- 
tude of the dominant parent. This was borne out by the subjects 
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usually being prejudiced for the ethnic group to which the father 
belonged. In one situation, however, it was found that two 
brothers (eight and twelve) who lived together had opposite 
swings of prejudice. It was tentatively concluded with reference 
to this one situation, at least, that the subjects with one Anglo 
and one Spanish parent were prejudiced in the same direction as 
the parent with whom they identified themselves. This was 
borne out by subjective evaluation and may possibly explain the 
rather high relationship manifest between parent dominance and 
attitude. It was also noted that there was a tendency for the 
subject who had identified himself with the mother to have the 
girl as heroine in the boy-girl situations of the Projective Test 
of Racial Attitudes. 

The accentuated aggression pattern might be attributed to 
tension resulting from conflicting tendencies. 


SUMMARY OF FINDINGS IN DEVELOPMENTAL ANALYSIS 


Both ethnic groups manifest a clear picture of development of 
racial attitude from the early age levels to the older. Though the 
attitudinal development of the two groups varied qualitatively 
and quantitatively, a parallel growth was still in evidence. 

An evaluation of the areas of personality studied demonstrated 
that each revealed either positive or negative prejudice. Inter- 
estingly enough it was ascertained that both superiority and 
submission, defeat and victory are expressions of prejudice— 
different patterns of attempt to adjust to conflict. 

Qualitatively analysis at the twelve-year levels does not reveal a 
marked difference. Quantitatively, however, the Anglos appear 
more prejudiced than the Spanish though not significantly so 
(critical ratio at approximately .25 level of significance). 

It is interesting to compare quantitatively the differences in 
attitudinal growth between the Spanish and Anglos at the 
selected age levels. The Spanish at the four-year level appeared 
less prejudiced than the Anglos of the same age (significance of 
the difference at .25 level), but during the ensuing four years this 
negative attitude developed to a level approximating that of the 
twelve-year-olds of that ethnic group. In distinction to this 
sequence of growth, the Anglo attitude developed little between 
the fourth and eighth years. Rapid acceleration, however, 
during the next four years (corresponding in developmental 
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progress to the Spanish four- to eight-year period) put the Anglo 
at the highest level of prejudice of all groups and ages studied. 

Qualitative analysis revealed features of attitudinal develop- 
ment for the two ethnic groups which demonstrated something 
of the dynamics of personality adjustment and the rédle of prej- 
udice in its function. It was previously mentioned in this 
study that the Anglos at the four-year level had less acquaintance 
with Spanish persons than the Spanish. The hypothesis was 
forwarded for explanation of this factor that some vicarious 
experience with suggestive qualities had colored the Anglo 
attitude toward the Spanish. 

Analysis of each category and the significance of age on its 
development brings out some important differences between the 
Spanish and Anglo groups. The Anglo group showed the 
greatest development in the final dominance of the Anglo over 
the Spanish and in the superiority of the Anglo to the Spanish. 
Contrary to this, the Spanish displayed as the most weighted 
factor in the total prejudice score the manifestations of frustration 
in reacting to the Anglo environment and in ultimate victory of 
the Spanish. Evidently, the Spanish were less assured of their 
superiority than were the Anglo and used aggression as an 
adjustment attempt. This finding was supported by subjective 
evaluation. 

In both the Spanish and Anglo groups, defeat of the character 
with whom the subject identified himself in the stories was a 
manifestation of prejudice since it occurred more frequently in 
the Spanish-Anglo situations than in the same group situations. 
The origin of its development, however, is little understood. 
In both ethnic groups it appeared as an almost constant positive 
factor throughout the age levels. Evidently, the genesis of 
this factor was at a very early age and it reached the twelve-year 
maximum before the age of four. 

The data indicated that the Anglos were less optimistic about 
the Spanish-Anglo relationship than were the Spanish. The 
Anglos appeared to be the aggressors and the Spanish merely 
attempting to adjust to this aggression. The comparison of 
satisfactory and unsatisfactory endings of the themes, the early 
development of Anglo attitude, superiority and different rates 
of prejudice development would demonstrate this. These 
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criteria indicated how the instilled attitude of one group may 
contribute toward the attitudinal development of the other. 

The younger subjects at the given age levels were analyzed 
independently and compared with the older subjects from the 
identical age group to ascertain the effect of one year on atti- 
tudinal development. The results indicated that, primarily 
at the four-year level, a measurable increment in prejudice was 
manifested in both ethnic groups. The data suggested that the 
origin of Spanish prejudice appeared to be very close to the three- 
and-one-half-year level while the genesis of the Anglo appeared 
somewhat before this time. 

An interesting indication of accentuated prejudice was mani- 
fested upon analysis of six subjects who had one Spanish and 
one Anglo parent. Though the number was small, it was found 
that the subject’s frustration and concomitant aggression in the 
situations depicted were higher than the total for the specific 
age group. The general prejudice score, too, was higher toward 
one ethnic group or the other. An attempt was made to ascer- 
tain the factors contributory to swing direction. 

In one situation, it was found that two brothers (eight and 
twelve years of age) who lived together had opposite swings of 
prejudice. It was tentatively concluded with reference to this 
one situation, at least, that the subjects with one Anglo and one 
Spanish parent were prejudiced in the same direction as the 
parent with whom they identified themselves. This was borne 
out by subjective evaluation and may possibly explain the rather 
high relationship manifested between parent dominance and 
attitude. It was also noted that there was a tendency for the 
subject who had identified himself with the mother to have the 
girl as heroine in the boy-girl situations of the Projective Test 
of Racial Attitudes. 

The accentuated aggression pattern might be attributed to 
tension resulting from conflicting tendencies. 
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VERBAL INTELLIGENCE AND EFFECTIVENESS 
OF PARTICIPATION IN GROUP DISCUSSION* 


NORMAN E. GREEN 
Major, USAF, Craig Air Force Base, Selma, Alabama 


This report is a part of a broad study of the factor of verbal 
intelligence or vocabulary power in Air Force teacher training. 
The investigation is being made at the Academic Instructor 
Division in the USAF Special Staff School of the Air University 
where a six-week instructor training course is presented. The 
test groups for the over-all study were two classes of Air ROTC 
instructor trainees, totaling two hundred fifty-two members. 
The population for this part of the study was one class of one 
hundred twelve of the Air ROTC instructor trainees. 

Because conference methods of instruction were being taught, 
it was felt that it would be valuable to identify some of the factors 
which contribute to effectiveness of participation. Since it 
seems logical that vocabulary power might have some bearing 
on success of conferees, this factor was investigated. In addi- 
tion, as mentioned below, instruction was carried out in small 
groups throughout the course, and the opinion was that for 
purposes of this school, the groups should be arranged hetero- 
geneously according to indicated effectiveness or capacity to 
succeed. Furthermore, an organismic interpretation of qualita- 
tive learning in codperative group activities demands a hetero- 
geneous arrangement of experiences. Such a situation gives 
potential basis for optimum interaction and individual growth. 
Thus, if verbal intelligence could be found to have relationship 
to effectiveness of participation, it would have value as a criterion 
for arranging the groups in a manner which tends toward maxi- 
mum learning. The problem represents an attempt to isolate 
and describe the relationship of one factor within a ‘field’ sur- 
rounding an organic group learning process. 

The major part of the instruction and all of the practice- 
teaching exercises within the Academic Instructor Course are 
conducted ‘in groups of about ten student-instructors. In the 
unit of instruction called Conference Methods of Teaching, with 
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440 








Intelligence and Effectiveness of Participation 44] 


which this paper is concerned, there were seven groups of ten 
members each and two groups of nine members each. These 
groups were in session for approximately ten hours of problem- 
solving discussion centering about such issues (chosen by the 
students) as, ‘‘ What Can Be Done to Improve the Airman Infor- 
mation Program ?”’’; ‘‘How can We Curb Sex Laxity in American 
Colleges?’’; ‘‘What Improvements Can We Make in Evaluation 
of Officer Effectiveness?’’ During the ten hours, each group 
was under the guidance and observation of one faculty-adviser 
for the first half of the session and another faculty-adviser for 
the last half of the session. 

The procedure, then, was to organize each one of the nine 
groups of conferees for the ten hours of problem-solving discus- 
sion heterogeneously according to scores obtained on two verbal 
intelligence tests. The tests used were twenty-word, steeply- 
graded, multiple-choice type. The words were taken from the 
I. E. R. Intelligence Scale CAVD. These were the same instru- 
ments reported on by Thorndike and Gallup in a survey of the 
American voting population,’ and earlier by Thorndike alone.? 
This system of determining group membership, having a balanced 
number of high, intermediate, and !ow scores in each group, thus 
equalized the groups as far as this yardstick of verbal intelligence 
was concerned. 

There were twenty-four judges involved in the experiment. 
These were regular staff members of the school. Each one had 
had considerable experience with conference methods of instruc- 
tion, both as participant and as critic teacher. Two observed 
each group. During the time they were observing the groups in 
discussion, the faculty-advisers did not know that they would 
later be asked to give their opinions about the effectiveness of 
the conferees. Nor did they have knowledge of the vocabulary 
scores received by the group members. 

On the day when this discussion phase was completed, each 
instructor involved was given a written memorandum on which 
he was asked to submit the names of three student-instructors 
whom he considered to be most effective as conferees during the 





1“*Verbal Intelligence of the American Adult,” R. L. Thorndike and 
G. H. Gallup, Journal of General Psychology, Vol. 30, Jan., 1944. 
2‘*Two Screening Tests of Intelligence,” R. L. Thorndike, Journal of 


Applied Psychology, Volume 26, 1942. 
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periods he observed the group. In order to determine this 
judgment, they were asked to concentrate upon those who par- 
\ ticipated most frequently and at the same time gave good qual- 
ity contributions. They were to name these three conferees in 
order of effectiveness. In addition, they were asked to submit 
‘ the names of three conferees whom they considered to be least 
effective. Reticence was the principal criterion for determining 
this. The three least effective were also to be named in numerical 
order. It is the writer’s opinion that, in view of the extended 
experience of the faculty-advisers with this conference method, 
the requested criterion of effectiveness of participation was a 
sound one. The judges were particularly aware of the dangers 
which develop through dominance of the discussion by one or a 
few members. And they recognized the benefits of constructive 
contributions, frequently termed ‘committee work.’ 

When the names of the most effective and least effective par- 
ticipants in each of the twelve groups were received, a point 
scoring system was established. A conferee named most effec- 
tive of the entire group received a score of plus 3; one named 
second most effective received a score of plus 2; and the one 
named third most effective of each group received a score of plus 
1. The same scoring was used for the least effective members, 
except that the figures were negative. Thus, if a conferee was 
named by each of the two judges (who worked independently) 
as the most effective of a particular group, he would receive the 
highest possible score, a plus 6, 

This scoring system, later used in the refinement of the data, 
permitted an equitable discrimination among effectiveness in 
each group. It was thus possible to (1) pick out the three most 
effective in each discussion group, (2) pick out the three least 
effective in each group, (3) pick out the top most effective par- 
ticipant in each group, (4) pick out the one representing least 
effectiveness in each group, (5) arrange the members of each 
group in rank order of effectiveness, and (6) through an adjust- 
ment of the raw scores, to compute a product-moment correlation 
coefficient between effectiveness and verbal intelligence for the 
whole group of fifty-four participants who were named on the 
effectiveness scale. 

From this point on, the experiment and treatment of the data 
were further organized into two phases. The first was an 
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attempt to establish a statistical relationship between verbal 
intelligence and effectiveness of conferees. This included the 
assemblying of the vocabulary scores of those named in the top 
bracket of effectiveness in all the groups (N = 27) and all those 
named in the least effective bracket (N = 27). It also included 
assemblying the vocabulary scores of the nine conferees named 
first in effectiveness of participation and the scores of those nine 
conferees who were named as representing the least effective 
member of each group. 

Table I presents the statistics for two larger groups, and shows 
the significance of the difference between the two means. 


TABLE I.—SIGNIFICANCE OF THE DIFFERENCE BETWEEN THE 
MEAN VOCABULARY SCORE OF THE Most EFFECTIVE Par- 
TICIPANTS IN Group DISCUSSION AND THE MEAN 
VOCABULARY SCORE OF THE LEAST EFFECTIVE 
PARTICIPANTS 


Mean Standard 
N Vocab. Score Deviation 


Most Effective (X)............. 27 29 .77 4.4 
Least Effective (Y)............. 27 26 .29 3.9 
ee waka abet 3.48 
Standard Error of Difference............ 1.14 
i ee ia sw ee we a 3.05 


The null hypotheses, that there is no true difference in verbal 
intelligence as measured by this yardstick between the most 
effective and least effective participants, is rejected at the .01 
level of significance. 

In dealing with the smaller groups in a similar manner, the 
small sample theory and formula were employed in the refine- 
ment of the data to determine the statistical significance of the 
actual difference of six points between the means of the two 
groups as shown in Table II. 

In this case, the difference is significant at the .03 level. 
Results obtained from the two treatments of the data show that 
a true difference exists in the groups and the difference is not due 


to chance. 
Thus, from the first phase of the study it was evident that a 
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TABLE IIJ.—SIGNIFICANCE OF THE DIFFERENCE BETWEEN THE 
MEAN VOCABULARY SCORE OF THE TopMosT EFFECTIVE 
PARTICIPANTS OF Eacu DiscussIoON GROUP AND THE 
MEAN VOCABULARY SCORE OF THE PARTICIPANTS 
SHOWING MINIMUM EFFECTIVENESS 

Mean Standard 
N Vocab. Score Deviation 


Participants with Maximum Effec- 


a 31.66 4.4 
Participants with Minimum Effec- 
rE 25 .66 5.3 
ee eee c csr’ 6.0 
Standard Error of Difference............ 2.43 
ed Ces Cem es 6 oo 2.88 


statistical relationship existed. In phase two, an attempt was 
made to discover more about the degree of relationship. 

Table III represents a breakdown of each of the nine groups 
showing that within each (except group #9), there was a difference 
in ‘vocabulary power’ consistently in favor of those named as 
effective participants. The range of the differences is from zero 
difference in group 9 to a difference of 7.67 points in group 2. 
The mean difference is 3.48. 


TABLE IIJ.—DIFFERENCE IN MEAN VOCABULARY SCORES OF 
Most EFFECTIVE AND LEAST EFFECTIVE PARTICIPANTS IN 
Nine Discussion GROUPS 





ee 1 2 3 4 5 6 7 8 9 





Mean Vocab. Score 
of most effective |31 .67/32.33/31.67/30. 66/29 . 00/28 . 33/27 . 66/30. 33/26.33 


Mean Vocab. Score 
of Least Effective|27 . 33/24. 66/28 . 00/24. 66/24. 00/27 . 66/27 . 00/27 .00/26.33 


Difference........ 4.34) 7.67) 3.67; 6.00) 5.00) 0.67) 0.66] 3.33) 0.0 
































Table IV presents, for each discussion group, the rank-differ- 
ence correlation coefficient between the members’ effectiveness 
scores and their scores in verbal intelligence. All groups had 
ten members except groups 7 and 9, which had nine members 
each. Although all coefficients are positive in a range from 
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TaBLE IV.—RaNK DIFFERENCE CORRELATION COEFFICIENTS 
BETWEEN VERBAL INTELLIGENCE AND EFFECTIVENESS OF 
PARTICIPATION FOR NINE Discussion GROUPS 





RS 1 2 3 4 5 6 7 8 9 





Rank Diff. Corre- 
lated Coefficient | + .29|+.57) +.36) + .40)+.52)+.10)+.27|+ .30|/+.09 
































+.09 to +.57 (mean coefficient is +.32) the small samples 
involved precluded any proof of statistical significance. 

In the final treatment of the data, the effectiveness scores of all 
fifty-four participants representing the top and bottom brackets 
of each group were paired with the vocabulary scores in a compu- 
tation of product-moment correlation. This coefficient was 
found to be +.45, with a standard error of .10, representing a 
correlation value significantly larger than zero at the .01 level. 

Although these correlations are not of high order, they indicate 
that a positive relationship is present to at least a fair degree. 
A comprehensive interpretation would not fail to take cognizance 
of other seemingly obvious factors having influence on one’s 
ability to participate effectively. Some of these might be 
familiarity with the topic of discussion, interpersonal relation- 
ships in the group, motivation and general interest. If these 
factors also are of importance, then the consistent positive sta- 
tistics in favor of verbal intelligence may give greater reason 
for attaching a relative value to this latter variable as one 
concomitant. 

These results might then be construed to demonstrate that 
verbal intelligence was a contributing factor to effective par- 
ticipation. Practically speaking, this means that the adminis- 
tration of these instruments and the use of the scores may serve 
as a valuable aid in setting up small learning groups balanced on 
the basis of the ability of the conferees to participate. A con- 
tinuation of the investigation of effectiveness in group processes 
must be directed toward study of other probable attending 
factors indicated above. Beyond this, further research may 
reveal additional educationally significant relationships between 
scores on these tests of verbal intelligence and capacity to per- 
form successfully in other teaching-learning practices. 








BOOK REVIEWS 


WILLARD L. VALENTINE AND Detos D. WIcKENs. Experimental 
Foundations of General Psychology. Third Ed. New York: 
Rinehart and Co., 1949, pp. 472. 


This third edition is a rather thorough revision of a text which 
has had wide usage since the appearance of the first edition. 
Nevertheless, the general pattern of the text is the same. In 
each chapter there is a preliminary orientation to the subject 
matter and problems plus definitions and in some cases method- 
ology. This is followed by selected investigations in condensed 
form which usually include introduction and hypothesis, method, 
results and a summary of interpretations. There is also a 
summary of the chapter. Throughout the book there is some 
emphasis upon correcting misconceptions about the nature of 
psychology, and the inclusion of materials which have a part in 
“determining the theoretical structure of modern psychology.” 

In this revision there is a new chapter on personality. Most 
of the material on perception is new, and the chapters on condi- 
tioning and learning are drastically revised. In most of the other 
chapters new experiments have been added or substituted for 
material omitted. In all the revision the selection of material 
and discussions have been centered on the more recent trends in 
the field. In particular there are new materials on social reac- 
tions and reports of investigations done by military psychologists. 

It is stated that the book is ‘“‘concerned principally with psy- 
chological facts and principles on which there is a majority, if 
not a universal, agreement.”” Some sections found in psychology 
texts are not included here. No particular system of psychology 
is emphasized. There is, however, emphasis upon utility and 
applications. In general, the materials indicate how a psycholo- 
gist collects information he needs and how he interprets the data 
collected. The book is designed as a supplement to standard 
texts. 

Most readers will agree that the selection of experiments has 
been done with discrimination. Nevertheless some may consider 
that in too many instances animal rather than human experi- 
ments were chosen. Furthermore, if contemporary trends are 
to be emphasized, the material on perception could have been 


both broader and more extensive. 
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The material throughout the book is interesting and is con- 
cisely and clearly presented. It would appear that the avowed 
aims have been adequately achieved. It is noteworthy that the 
figures and typography are excellent. There can be little doubt 
that this revision will be popular and enjoy wide usage. 

Mies A. TINKER 


University of Minnesota 


Susan Deri. Introduction to the Szondi Test. New York: Grune 
and Stratton, 1949, pp. 354. 


Mrs. Deri’s familiarity with the Szondi test extends over a 
period of about eleven years, the first four of which were spent 
working closely with Dr. Lipot Szondi in the early applications 
of this method to personality diagnosis. Dr. Szondi’s highly 
laudatory Foreword is confirmation of the authority with which 
the author writes. 

The Szondi test consists of forty-eight photographs of mental 
patients. These fall into six sets each containing eight pictures; 
one each of a homosexual, a sadist, an epileptic, an hysteric, a 
catatonic schizophrenic, a paranoid schizophrenic, a manic 
depressive depressive, and a manic depressive manic. One set 
of photographs is displayed to the subject at a time. From each 
set the subject selects the two pictures he likes most and the two 
he dislikes most. When this process has been completed, the 
subject chooses from among the twelve ‘most liked’ the four 
for which he has the greatest liking and from among the twelve 
disliked the four arousing the greatest aversion. These responses 
are transferred to an appropriate graphic record sheet. The test 
must be repeated ‘‘at least six, preferably ten, times, with at 
least one day intervals between administrations, to be able to 
give a valid clinical interpretation of the personality.” An 
optional extension, entitled the ‘factorial association experiment,’ 
involves asking the subject to tell stories about each of the eight 
pictures finally chosen. 

The major portion of the book is devoted to an exposition of the 
interpretation of scores. Need-tension is regarded as the driving 
force ‘‘in the sense of directing the person to perform certain acts 
and to choose or avoid certain objects.”” The type of activity 
or object depends upon the kind of need. ‘‘Its field of applica- 
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tion is again similar to that of other projective technics; in other 
words, as a diagnostic instrument for clinical use or for the 
interpretation of the so-called normal personality, vocational 
guidance, experimental social psychology and a variety of fields 
of research.”’ 

Mrs. Deri avoids an expression of attitude toward the gene- 
theory which this test was originally intended to ‘prove experi- 
mentally.’ According to Szondi’s theory, ‘‘the mental disorders 
represented in the test are of genetic origin and the subject’s 
emotional reactions to these photographs were believed to depend 
upon some sort of similarity between the gene-structure of the 
patient represented by the photograph and that of the subject 
reacting to the photograph.”’ Mrs. Deri takes the position that 
the test’s effectiveness has been established whether or not the 
early hypothesis is accepted. 

To the reader who is accustomed to look for experimental data 
in support of claims of usefulness or validity, this volume is 
wholly inadequate. Elaborate explanations of the significance 
of various factors are given without even frequency tables..- 
Sums, differences and ratios are computed and interpreted with- 
out thought for the problems of reliability involved. Shifts in 
type preference or aversion from one day to the next are regarded 
as indications of changes in the needs and tensions of the subject 
rather than errors of measurement. Validity in the usual sense 
appears to be dismissed in the following paragraph. 

“The superficial appearance of normalcy is responsible for the 
extreme difficulties inherent in the problem of validating studies 
on the basis of observable behavior or verbal or written ques- 
tionnaires. Many basically unhappy individuals who are unable 
really to become emotionally attached to any person or object 
would rate extremely high on a written adjustment inventory, or 
on the basis of observation.” 

This reviewer can only conclude that, on the basis of Mrs. 
Deri’s book, the Szondi test is one more unproven instrument 
proposed as an aid in personality diagnosis. Pending the publi- 
cation of adequate, supporting evidence, the psychologist using 
this method is proceeding on the basis of faith and should be 


willing to recognize the peril of his course. 
GEORGE K. BENNETT 


The Psychological Corporation 
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